Back to news

Code

Agentic Coding 2.0: Agents Conducting Other Agents.

The next evolution of agentic coding is not better single agents. It is meta-agents that orchestrate specialist agents, review their work, and maintain quality standards across complex engineering projects.

AI Kick Start editorial image for Agentic Coding 2.0: Agents Conducting Other Agents.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: The next evolution of agentic coding is not better single agents. It is meta-agents that orchestrate specialist agents, review their work, and maintain quality standards across complex engineering projects.

Key takeaways

  • Briefing: For most of the last two years, an "AI coding assistant" meant one thing: a single bot that wrote code while you watched.
  • The Conductor Pattern: The conductor pattern is the core idea.
  • Quality Gates: The critic is what keeps this honest.
  • Dynamic Rebalancing: This is where [Claude Code's dynamic workflows](https://claude.com/blog/introducing-dynamic-workflows-in-claude-code) earn their keep.
  • Self-Correction Loops: When the critic rejects something, a good conductor doesn't just bounce it back for a redo.

Briefing

For most of the last two years, an "AI coding assistant" meant one thing: a single bot that wrote code while you watched. Useful, sometimes impressive, but still one set of hands on the keyboard. That picture is starting to change.

The newer idea is to stop asking one agent to do everything and instead put one agent in charge of several. A lead agent holds the plan, hands jobs to specialists, checks what comes back, and keeps the whole thing on standard. Some people are calling this agentic coding 2.0. The label is marketing, but the shift underneath it is real, and it borrows its shape from something every business already understands: a team with a manager.

For an Australian business team weighing up where AI fits, the "so what" is straightforward. The promise is that more of the routine build work (the CRUD APIs, the component libraries, the deployment plumbing) can run with less hand-holding, while a person stays in the loop for the calls that actually need judgment. The catch is that a lot of the detail floating around, including specific model names, is ahead of what has actually shipped. So it pays to separate what is built from what is pitched.

Here is the architecture, the quality controls, and the honest limits.

The Conductor Pattern

The conductor pattern is the core idea. Four roles do the work.

The Conductor (the meta-agent): holds the high-level goal, tracks the project state, decides what happens next, and judges quality. The argument is to run this on the strongest model you have access to. Claude Opus 4.8, which Anthropic released on 28 May 2026 as its most capable model, is a fair fit for that seat. Other names get thrown around for this role too, including GPT-4.1 and Nous Research's Hermes 3, though both are weaker choices than the framing suggests: GPT-4.1 was retired from ChatGPT in February 2026 and superseded by newer GPT-5 models, and Hermes 3 is a 2024 open-weight Llama fine-tune rather than a current closed flagship.

Section Leads (the domain specialists): each one owns a patch, such as backend, frontend, infrastructure, testing, or documentation. They take the conductor's direction and turn it into specific tasks for their area.

Players (the execution agents): they write the code, run the tests, and generate the docs. The pitch is to run these on smaller, faster models, since their jobs are tightly defined.

The Critic (the quality agent): reviews everything before it is accepted. It looks for bugs, style breaches, security holes, and work that drifts from the agreed architecture. It can reject any output and send it back.

A worked example often looks like the tree below. Note that the specific model versions shown for the leads and players (Sonnet 4.8, Haiku 4.8) are illustrative and do not match Anthropic's current lineup, which tops out at Sonnet 4.6 and Haiku 4.5 as of mid-2026:

Conductor (Opus 4.8)
  |-- Backend Lead (Sonnet 4.8)
  |     |-- API Agent (Haiku 4.8)
  |     |-- Database Agent (Haiku 4.8)
  |     |-- Auth Agent (Haiku 4.8)
  |-- Frontend Lead (Sonnet 4.8)
  |     |-- Component Agent (Haiku 4.8)
  |     |-- State Management Agent (Haiku 4.8)
  |-- Infra Lead (Sonnet 4.8)
  |     |-- Deployment Agent (Haiku 4.8)
  |-- Critic (Sonnet 4.8)

Quality Gates

The critic is what keeps this honest. Every piece of code has to clear a set of gates before it lands in the codebase:

  1. Syntax gate: compiles without errors (automatic)
  2. Test gate: all existing tests pass, and new code ships with new tests (automatic)
  3. Style gate: matches the team's conventions (automatic plus model-based)
  4. Security gate: no obvious vulnerabilities (model-based scan)
  5. Architecture gate: fits the project architecture (critic evaluation)
  6. Integration gate: works alongside the other components (integration test)

Only the first two gates are fully automatic. Gates 3 to 6 lean on model-based judgment, which is imperfect, but it still beats shipping with no review at all. The conductor decides when to trust the critic and when to push a call up to a person.

Dynamic Rebalancing

This is where Claude Code's dynamic workflows earn their keep. The feature lets Claude run several subagents in parallel, split work into subtasks, and check the results, which is what makes rebalancing possible. Say the backend section is crawling while the frontend section sits idle, blocked on API contracts it hasn't been handed yet. The conductor can shuffle agents around: pull a frontend agent over to help design the API, or spin up extra backend agents to knock out independent endpoints at the same time.

Self-Correction Loops

When the critic rejects something, a good conductor doesn't just bounce it back for a redo. It reads the pattern in the rejections. If the critic keeps flagging style problems in one section, the conductor rewrites that section lead's instructions. If the problem is architectural drift, the conductor may revise the plan. If the critic is being too strict, the conductor dials the threshold back.

That loop means the team gets better as it works. Early on, rejections come thick and fast. After a few rounds, the section leads have absorbed the project's standards and the rejection rate falls.

Integration with Existing Tooling

This approach doesn't throw out the tools you already use. It runs them. Git handles branches, commits, and PRs. CI/CD enforces the quality gates. Issue trackers carry the status. Documentation stays close to the code. Human review still happens, just after the critic's pass rather than in place of it.

Current Limitations

Be clear-eyed here, because much of this is still the author's framework rather than a settled industry standard. It works best on greenfield projects with clear requirements, on refactoring jobs with well-defined scope, and on standard patterns like CRUD APIs, component libraries, and deployment pipelines.

It struggles with the rest: novel architectural calls that need taste and judgment, debugging a production incident with the clock running, coordinating across external teams and dependencies, and requirements that shift halfway through the build.

Those limits are likely to move. As models get better and the tooling matures, the line of what a meta-agent can handle should keep creeping outward. The interesting question isn't whether this approach eventually takes on complex projects too. It's when.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick the smallest useful workflow that proves the pattern.
  2. Write down the owner, data boundary, review point, and success measure.
  3. Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI agent design systems.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Agentic Coding 2.0: Agents Conducting Other Agents

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call