Multi-Agent
Research System
Learn how to architect, orchestrate, and reliably operate a multi-agent research pipeline using the Claude Agent SDK — coordinator delegation, parallel execution, context isolation, and robust failure handling.
Agent Topology — Click Any Node
👆 Click any agent node above to see its role, responsibilities, system prompt design, and key decisions.
Architecture Principles
Context Isolation
Each subagent runs in its own context window. This means the Search Agent's web crawl results don't bloat the Synthesis Agent's working memory. Isolation prevents cross-contamination and enables parallel execution.
Parallel Execution
The coordinator can fan out tasks to multiple subagents simultaneously. Search 5 topics in parallel, then collect results. The Agent SDK supports concurrent spawning — dramatically cutting total wall time vs. sequential operation.
Structured Handoffs
Agents communicate through structured data, not raw text. The Search Agent returns a typed SearchResult[], the Analyst returns Finding[]. Schema contracts prevent silent corruption as data flows between agents.
Retry & Fallback Logic
Subagents fail silently in naive systems. Build retry logic at the coordinator level: if Search Agent returns empty results, retry with a reformulated query. If it fails twice, the coordinator logs the gap and proceeds with partial data.
Grounding & Citations
Every claim in the final report must trace back to a source URL. The Analyst Agent extracts claim–citation pairs, the Synthesis Agent preserves the chain, and the Report Agent surfaces them. No ungrounded assertions allowed.
Max Turns & Budget
Every subagent invocation must have a max_turns limit and an optional token budget. Without caps, a misbehaving agent can loop forever. The coordinator checks turn counts and terminates runaway agents proactively.
Watch the System Execute a Research Task
Reference Implementation
Key Orchestration Patterns — Click to Expand
The coordinator “fans out” a set of tasks to multiple subagents running in parallel, then “fans in” by collecting all results before proceeding to the next phase. This is the core pattern of the research system — search queries run in parallel, analyst agents run in parallel per-source.
Implementation: use asyncio.gather(*jobs, return_exceptions=True). The return_exceptions=True flag is critical — without it, a single failed agent crashes the whole gather.
asyncio.Semaphore(5) to cap concurrent agents at 5.Every inter-agent handoff should use a typed schema (TypedDict in Python, Zod in TypeScript). This prevents the most common failure mode: Agent A returns a JSON object with slightly different field names than Agent B expects, causing a silent KeyError.
Pattern: define all schemas in a central schemas.py. Each agent imports only the types it needs. The coordinator validates agent outputs against the schema before passing to the next stage.
Multi-agent pipelines can run for 3-5 minutes. If the Report Agent fails after 4 minutes of successful work, you don’t want to restart from scratch. Checkpoint the outputs of each phase to persistent storage.
Pattern: after each pipeline phase, serialize the output to a JSON file keyed by task_id + phase. On startup, check if a checkpoint exists and resume from the last completed phase.
/tmp/research/{task_id}/phase_{n}.json. Costs 10 lines of code; saves huge amounts of wasted compute on failure.Subagents should NOT retry themselves. Retry logic belongs in the coordinator, where it can apply strategy: reformulate the query, wait and retry, or try a different agent. A self-retrying subagent can loop indefinitely.
Pattern: coordinator calls subagent, gets empty/error result, reformulates the input, calls again once. If still empty, the coordinator logs a gap. Never a third retry — diminishing returns.
Not all sources are equal. Give the Search and Analyst Agents a credibility scoring tool. Reject sources below a threshold (e.g. 0.3) at the analysis phase, before they reach synthesis.
Credibility signals: domain type (.edu=high, .gov=high, known news orgs=medium, personal blogs=low), recency, peer-review status, citation count.
When sources conflict on factual claims, the Synthesis Agent must have an explicit resolution strategy — not just “pick one.” The preferred hierarchy: peer-reviewed > institutional report > established news > blog. When sources of equal credibility conflict, flag it as “contested” in the report.
Never instruct the synthesis agent to “use your best judgment” on factual conflicts. Be explicit: “When sources conflict, weight by credibility_score. If scores are within 0.1, flag as contested.”
unresolved_conflicts[] list. The Report Agent should surface these explicitly.What Goes Wrong — And How to Fix It
A claim exits the Synthesis Agent without a source URL. The Report Agent surfaces it as an ungrounded assertion. At scale, 15-20% of claims lose citations in naive implementations.
The Synthesis Agent receives full document content from 25+ sources. Even at 1,000 tokens/doc, that’s 25,000 tokens before any reasoning. The agent silently truncates older findings.
A Search Agent’s web_search returns 0 results. Without max_turns, the agent loops attempting reformulations indefinitely. One stuck agent blocks the coordinator’s gather() call.
Agent A starts returning source_link instead of source_url after a prompt change. Agent B’s parser fails silently, treating all claims as uncited.
The Report Agent “enriches” the synthesis with facts from its training data — not from sources. The final report looks good but contains ungrounded statistics.
Fanning out 20 Search Agents simultaneously causes an API rate limit error. All 20 agents fail. The coordinator has no results.
How to Measure a Working Research System
Why each metric matters
Citation Coverage ≥85%
Every factual claim in the final report must trace to a URL. Below 85% means the Synthesis or Report Agent is hallucinating content. Enforce this with a post-processing validator that checks citation density per paragraph.
Pipeline Time P90 <3min
Parallel execution is your biggest lever here. If you run search queries sequentially, a 6-query plan takes 6x longer. asyncio.gather() with proper rate-limit handling is non-negotiable for production systems.
Sources Per Angle ≥4
Research quality degrades when an angle is covered by only 1-2 sources. The Synthesis Agent must flag under-covered angles. If fewer than 4 sources address a key angle, trigger a follow-up Search Agent run.
Unhandled Failures <5%
Some failures are expected (paywalled URLs, rate limits). But they must be handled gracefully — logged, retried once, then skipped with a gap note. Any exception that crashes the pipeline entirely is a critical bug.