Back to news

Code

Multi-Agent Orchestration: Running 10 Agents That Beat 1.

Why a coordinator with nine specialised sub-agents outperforms a single large model. Real patterns from Claude Code's Dynamic Workflows, Hermes' distributed runtime, and OpenClaw's sub-agent architecture.

AI Kick Start editorial image for Multi-Agent Orchestration: Running 10 Agents That Beat 1.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: The big shift in agentic coding this year is not a smarter model. It is the realisation that a team of narrow, specialised agents tends to do better work than one general agent running on a bigger model. [Claude Code's Dynamic Workflows on Opus 4.8](https://www.anthropic.com/news/claude-opus-4-8), [Hermes' distributed runtime](https://github.com/nousresearch/hermes-agent), and [OpenClaw's scheduled sub-agents](https://docs.openclaw.ai/automation/cron-jobs) each get there by a different road, but they all land on the same idea: split the job up, give each piece to a focused worker, and let a coordinator stitch the results together.

Key takeaways

  • A team of focused, cheaper agents under one coordinator is becoming the dominant pattern in agentic coding, though the claim that they beat a single larger model is an editorial read, not a benchmark.
  • Specialisation works because it sidesteps context overflow and attention dilution: each agent has one well-defined job, and only the coordinator pays for a bigger model.
  • The Coordinator-Router-Specialist stack is the common backbone; competitive redundancy, learning agents (Hermes), cron-scheduled hierarchies (OpenClaw), and tmux sessions are variations on it.
  • Multi-agent orchestration carries 20-30% coordination overhead and a 1.5-3x cost multiplier (working heuristics, not sourced figures), so reserve it for jobs of roughly 5+ files or 3+ concerns.
  • Watch coordination ratio, conflict rate, quality delta, and cost multiplier to know whether the extra machinery is paying off.

Analysis

For most of the last two years, the race in AI coding tools was about who had the biggest, smartest single model. Anthropic, OpenAI, and a handful of open-weight labs kept shipping models that could hold more in their heads at once. The pitch was simple: hand it the whole problem, and it will sort the whole problem.

That pitch is quietly being replaced. The teams getting the most out of these tools in 2026 are not feeding everything to one model. They are running small fleets of agents, each with a single job, coordinated by one smart agent on top. Think of it less like hiring one brilliant generalist and more like running a small workshop where one person plans, others build, test, and document, and the planner keeps everyone pointed the same way.

For an Australian business team, the practical takeaway is this. The interesting question is no longer "which model is best." It is "how do I get a group of cheaper, focused agents to outwork one expensive one." The reported wins are real enough to pay attention to, and the trade-offs (extra cost, coordination friction) are real enough that you should not turn it on for every job.

Worth a caveat up front. The headline that "ten specialised agents beat one larger agent" is an editorial reading of where things are heading, not a published benchmark. Treat it as a direction the field is moving in, not a settled fact.

Why Multiple Agents Win

A single large model has to carry the whole problem in its context window at once. It is reasoning about architecture, syntax, testing, documentation, and deployment in the same breath. That sets up two ways to fail. The first is context overflow, where the model loses the thread on details that mattered. The second is attention dilution, where it does an okay job on everything and a great job on nothing.

Splitting the work flips that around. A coordinator agent holds the high-level plan. Router agents hand out the sub-tasks. Specialist agents each own a narrow patch: one writes tests, one handles migrations, one keeps the docs current. Because each specialist has a tightly defined job, it can run on a smaller, cheaper model. The coordinator runs on a bigger model and spends all of its attention on the harder problem, which is integration and resolving conflicts between the specialists.

Pattern 1: The Coordinator-Router-Specialist Stack

This is the pattern you see most often across all three platforms:

Coordinator (Opus 4.8 / Hermes 3 / GPT-4.1)
  ├── Router: Task decomposition and dispatch
  ├── Specialist A: Code generation
  ├── Specialist B: Test generation
  ├── Specialist C: Documentation
  ├── Specialist D: Migration scripts
  └── Specialist E: Dependency analysis

(One note on the diagram: Hermes 3 is a real open-weight model family from Nous Research, but it predates the newer Hermes Agent runtime mentioned later. Listing it as a peer coordinator alongside Opus 4.8 and GPT-4.1 is illustrative, not a documented setup.)

The coordinator takes the high-level request ("migrate from Express to Fastify"), works out a plan, and farms each sub-task to the right specialist. Specialists run in parallel wherever the work allows. The coordinator then reviews what comes back, sorts out conflicts (say, Specialist A changed an interface that Specialist B's tests rely on), and assembles the finished result.

In Claude Code, this runs through Dynamic Workflows. The snippet below is illustrative pseudo-CLI rather than a real command. In practice, Dynamic Workflows are JavaScript scripts that Claude writes and a runtime executes, not a declarative --specialist flag, so read this as a sketch of the idea:

# Define a workflow with multiple specialist subagents
claude workflow create --name "migration"   --specialist "code:code-gen"   --specialist "test:test-gen"   --specialist "docs:doc-gen"   --coordinator opus-4.8

# Execute with a high-level prompt
claude workflow run migration "migrate from Express to Fastify"

Pattern 2: Competitive Redundancy

For code paths you cannot afford to get wrong, some teams run several specialist agents on the same task with different model seeds. A judge agent then compares the outputs and either picks the best one or merges them. It costs you 2-3x more, but it catches subtle bugs a single agent would sail past.

Coordinator
  ├── Generator A (Claude Sonnet 4.8)
  ├── Generator B (GPT-4.1)
  ├── Generator C (Hermes 3)
  └── Judge (Opus 4.8): selects best output

One flag on that diagram: "Claude Sonnet 4.8" is a rumoured model that has not shipped. The latest Sonnet available is 4.6, and the released 4.8 model is Opus, not Sonnet. Read the Sonnet entry as a placeholder for whatever current generator you actually have on hand.

Pattern 3: Learning Specialist Agents

Hermes takes the specialist idea further with its learning loop. Its specialist agents do not only run tasks, they learn from them. When the test specialist spots a recurring bug pattern, it writes up a skill signature and passes it to the code specialist. After a while, the code specialist starts heading off that pattern before it happens. This agent-to-agent learning is Hermes' own thing, and it is why its multi-agent setups tend to improve faster than orchestrations that stay static.

# Hermes agent-to-agent skill sharing
hermes.skills.share(
    from_agent="test-specialist",
    to_agent="code-specialist",
    skill_signature="avoid-null-returns-in-async-functions",
    confidence=0.94
)

Pattern 4: Cron-Scheduled Agent Hierarchies

OpenClaw's sub-agent architecture can run agents on a schedule. A parent agent spawns child agents that fire on cron timers: a daily dependency audit, an hourly security scan, a weekly docs review. Each child reports back to the parent, which gathers the findings and decides whether a human needs to step in.

{
  "subAgents": [
    {
      "name": "security-scanner",
      "schedule": "0 * * * *",
      "skill": "security-audit",
      "reportTo": "main-agent",
      "threshold": "critical-only"
    }
  ]
}

Pattern 5: tmux-Based Multi-Agent Sessions

If you live in the terminal, Claude Code can be driven across a tmux session, with each agent in its own pane and the coordinator passing messages through tmux. It works well for long-running jobs where you want eyes on each agent's progress. One caveat: the command below is illustrative. Practitioners routinely run several Claude Code panes in tmux by hand, but a dedicated claude multi-agent --tmux flag is not a documented official feature, so do not assume it works verbatim:

# Launch 3 agents in a tmux session
claude multi-agent --tmux --agents 3 --task "refactor monolith into microservices"

The Overhead Problem

None of this is free. Coordination overhead, meaning the time agents spend talking to each other, sorting out conflicts, and waiting on dependencies, can eat 20-30% of total run time. As a rough rule of thumb, the break-even point sits somewhere around 5 or more files touched, or 3 or more distinct concerns (code, tests, docs, migrations). Below that, one agent is faster and cheaper. Treat these as practitioner heuristics rather than figures from a published study, since they are not independently sourced.

Measuring Multi-Agent Performance

You cannot manage what you do not measure, and orchestration is no exception. The metrics worth watching:

  • Coordination ratio: time spent coordinating versus time spent actually working (target: under 25%)
  • Conflict rate: share of sub-agent outputs that need reconciling (target: under 10%)
  • Quality delta: bug rate of multi-agent output against single-agent output on the same task
  • Cost multiplier: total token cost of multi-agent versus single-agent (usually 1.5-3x)

A reminder on those targets: the under-25% and under-10% figures, like the cost multipliers, are reasonable working benchmarks rather than vendor-published numbers, so calibrate them against your own runs.

Claude Code's Dynamic Workflows ship with telemetry built in. Hermes agents log coordination events to an FTS5 session database. OpenClaw's parent agents keep tabs on their children through a file-based memory system.

The Future: Meta-Agents

The next step, still mostly speculative, is meta-agents: agents that design the topology for a given task. "This job needs two code agents, one test agent, and a migration agent" should be a call an agent makes, not a human. Early reported experiments with Claude Code's Task System suggest it is workable, with a meta-agent reading the task, choosing the specialist mix, watching execution, and rebalancing when a specialist is struggling. If that holds up, multi-agent orchestration shifts from something you build to something you simply ask for. For now, treat it as a direction of travel rather than a shipped feature.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick the smallest useful workflow that proves the pattern.
  2. Write down the owner, data boundary, review point, and success measure.
  3. Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI agent design systems.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Multi-Agent Orchestration: Running 10 Agents That Beat 1

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call