The original is one click away. Open original ↗
Matching AI agent architecture to the right problem
Executive overview
AI agents fail not because models are too weak or prompts are wrong, but because the architecture is mismatched to the problem. Two research papers — from Anthropic and Cognition — reveal opposite conclusions about multi-agent systems, and both are correct for their respective domains.
Multi-agent systems win at research. Single agents win at coding. The deciding factor is whether subtasks are independent (distribute) or interdependent (centralise).
Matching architecture to problem type is the primary determinant of agent success.
Why multi-agent systems excel at research
- Anthropic's deep research feature routes one complex question through an orchestrator agent that breaks it into sub-questions
- Sub-agents run in parallel, each researching a subset independently
- A dedicated citation sub-agent verifies every source against its claim
- Results are synthesised into long-term memory to survive context window limits
- Multi-agent deep research outperformed single-agent by 80% — at 15x the token cost
- Distributed context means each agent works within its window; no single agent is overwhelmed
Why multi-agent systems fail at coding
- Cognition (makers of Devin) found multi-agent coding produces unreliable results
- Coding tasks are interdependent: every action taken by one sub-agent affects all others
- Example: one sub-agent builds a Mario-style world; another builds Flappy Bird physics — the merged output breaks both
- There is no safe isolation boundary between parallel coding tasks
- Errors compound silently; no sub-agent knows what the others have done wrong
How Devin handles coding with a single agent
- Single agent receives the full task, creates a plan, then executes linearly
- After each step, the agent compresses key decisions and context before passing to the next step
- Compressed context keeps the window manageable without losing critical state
- Centralised context ensures every action builds correctly on what came before
- Reliability and consistency — not speed or breadth — are the priority
Decision framework: multi-agent vs single agent
Choose multi-agent when:
- The problem can be broken into genuinely independent sub-problems
- Diverse perspectives from parallel research add value to the final output
- The use case justifies the higher token cost (15x vs single agent)
- Breadth of coverage matters more than tight internal consistency
Choose single agent when:
- Sub-tasks depend on each other — a domino effect exists between steps
- The output must work as a unified whole (e.g., a functioning codebase)
- High reliability on every run is required
- Depth of execution matters more than breadth of perspectives
Evals: how Anthropic measures deep research quality
- Factual accuracy — does the claim match the source?
- Citations — does the source actually support what is cited?
- Completeness — does the answer address all parts of the question?
- Source quality — primary sources and reputable outlets ranked above SEO-optimised sites
- Tool efficiency — are the right tools called at the right moments?
- Initially, five separate LLM judges evaluated each dimension; consolidating into one LLM judge improved performance
Core principle: context management
- Context management is one of the most critical factors in agent performance
- Multi-agent: distributed context — each agent holds a subset, parallel execution, no shared state
- Single agent: centralised context — compressed and passed forward, sequential, fully shared state
- The right context strategy depends entirely on whether the task requires breadth or depth
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.