Matching AI agent architecture to the right problem

Executive overview

AI agents fail not because models are too weak or prompts are wrong, but because the architecture is mismatched to the problem. Two research papers — from Anthropic and Cognition — reveal opposite conclusions about multi-agent systems, and both are correct for their respective domains.

Multi-agent systems win at research. Single agents win at coding. The deciding factor is whether subtasks are independent (distribute) or interdependent (centralise).

Matching architecture to problem type is the primary determinant of agent success.

Why multi-agent systems excel at research

Anthropic's deep research feature routes one complex question through an orchestrator agent that breaks it into sub-questions
Sub-agents run in parallel, each researching a subset independently
A dedicated citation sub-agent verifies every source against its claim
Results are synthesised into long-term memory to survive context window limits
Multi-agent deep research outperformed single-agent by 80% — at 15x the token cost
Distributed context means each agent works within its window; no single agent is overwhelmed

Why multi-agent systems fail at coding

Cognition (makers of Devin) found multi-agent coding produces unreliable results
Coding tasks are interdependent: every action taken by one sub-agent affects all others
Example: one sub-agent builds a Mario-style world; another builds Flappy Bird physics — the merged output breaks both
There is no safe isolation boundary between parallel coding tasks
Errors compound silently; no sub-agent knows what the others have done wrong

How Devin handles coding with a single agent

Single agent receives the full task, creates a plan, then executes linearly
After each step, the agent compresses key decisions and context before passing to the next step
Compressed context keeps the window manageable without losing critical state
Centralised context ensures every action builds correctly on what came before
Reliability and consistency — not speed or breadth — are the priority

Decision framework: multi-agent vs single agent

Choose multi-agent when:

The problem can be broken into genuinely independent sub-problems
Diverse perspectives from parallel research add value to the final output
The use case justifies the higher token cost (15x vs single agent)
Breadth of coverage matters more than tight internal consistency

Choose single agent when:

Sub-tasks depend on each other — a domino effect exists between steps
The output must work as a unified whole (e.g., a functioning codebase)
High reliability on every run is required
Depth of execution matters more than breadth of perspectives

Evals: how Anthropic measures deep research quality

Factual accuracy — does the claim match the source?
Citations — does the source actually support what is cited?
Completeness — does the answer address all parts of the question?
Source quality — primary sources and reputable outlets ranked above SEO-optimised sites
Tool efficiency — are the right tools called at the right moments?
Initially, five separate LLM judges evaluated each dimension; consolidating into one LLM judge improved performance

Core principle: context management

Context management is one of the most critical factors in agent performance
Multi-agent: distributed context — each agent holds a subset, parallel execution, no shared state
Single agent: centralised context — compressed and passed forward, sequential, fully shared state
The right context strategy depends entirely on whether the task requires breadth or depth

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks

AI strategy & adoption

YouTube

How to choose the right home for your AI workflow

Dylan Davis May 13, 2026

AI strategy & adoption 9

Automation & tools 6

AI defaults to building apps — that's usually the wrong choice
85–90% of workflows belong inside a project or skill, not deployed code
Deploying an app triggers per-token API costs that subscriptions don't cover

Matching AI agent architecture to the right problem

Executive overview

Why multi-agent systems excel at research

Why multi-agent systems fail at coding

How Devin handles coding with a single agent

Decision framework: multi-agent vs single agent

Evals: how Anthropic measures deep research quality

Core principle: context management

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

Why multi-agent systems excel at research

Why multi-agent systems fail at coding

How Devin handles coding with a single agent

Decision framework: multi-agent vs single agent

Evals: how Anthropic measures deep research quality

Core principle: context management

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.