Briefing
Watch any AI coding demo and the pitch is the same: type a sentence, wait thirty seconds, get a finished feature. The crowd claps. Then you point the same tool at your own codebase, the one with fifteen years of patches, half-finished migrations and tooling nobody documented, and it falls apart.
That gap between the demo and the day job is the whole reason Anthropic shipped the Task System in Claude Code. It landed in January 2026 alongside Opus 4.5 and Claude Code 2.1, replacing the older "Todos" checklist with something built to survive long, messy projects (VentureBeat). The "anti-hype" label is mine, not Anthropic's. But it fits, because the feature is deliberately unglamorous.
For an Australian business team weighing up agentic coding tools, the question isn't whether an AI can write a tidy function on a blank page. It's whether it can keep its head when the work gets complicated and hand back control when it should. That's what this system is trying to do, and it's worth understanding how before you trust it on real work.
The Hype Problem
Agent demos are cherry-picked. The prompt is rehearsed, the codebase is clean, and the failures get cut from the tape. Real engineering doesn't work that way. It's vague requirements, legacy constraints, and a half-built feature from someone who left the company three years ago. An agent that writes lovely code on a greenfield project can be useless the moment it touches a brownfield one.
The Task System is built around that reality. It doesn't promise to "just get it done." It promises to decompose, execute, checkpoint, and recover, which is roughly what a good senior engineer does when handed a job they've never seen before.
Hierarchical Task Decomposition
When you submit a request, the Task System reads it and breaks it into sub-tasks with explicit dependencies. This is more than a flat checklist. Anthropic's Tasks support dependencies and parent-child relationships, so work nests as Project, then Feature, then Component, then the leaf tasks, managed through the TaskCreate, TaskUpdate, TaskList and TaskGet tools (VentureBeat).
One way to picture each node, and this schema is an illustration rather than a documented Anthropic spec, is that every sub-task carries:
- Objective: What this sub-task must accomplish
- Inputs: Files, context, and state required
- Outputs: Expected artefacts (files, tests, documentation)
- Dependencies: Which other sub-tasks must complete first
- Estimated complexity: Low, medium, high, or unknown
- Verification criteria: How to confirm successful completion
Task: Migrate from REST to GraphQL
├── Sub-task 1: Define GraphQL schema from existing REST endpoints
│ ├── Input: OpenAPI spec, current route handlers
│ ├── Output: schema.graphql
│ └── Complexity: Medium
├── Sub-task 2: Implement resolvers
│ ├── Input: schema.graphql, database models
│ ├── Output: src/resolvers/**/*.ts
│ └── Dependencies: Sub-task 1
│ └── Complexity: High
├── Sub-task 3: Add GraphQL server middleware
│ ├── Input: src/app.ts
│ ├── Output: Updated src/app.ts
│ └── Dependencies: Sub-task 2
│ └── Complexity: Low
├── Sub-task 4: Write tests for resolvers
│ ├── Input: src/resolvers/**/*.ts
│ ├── Output: src/resolvers/**/*.test.ts
│ └── Dependencies: Sub-task 2
│ └── Complexity: Medium
└── Sub-task 5: Update API documentation
├── Input: schema.graphql
├── Output: docs/api.md
└── Dependencies: Sub-task 1
└── Complexity: LowState Persistence and Recovery
The part that matters most is state persistence. If a task fails at sub-task 3 of 7, the system doesn't start over. It picks up from the failure point, carrying the context of what worked and what's left. Claude Code writes tasks to the local filesystem at ~/.claude/tasks, so you can close the terminal, switch machines, or recover from a crash and reload the project state, and tasks survive context compactions inside long sessions (VentureBeat). That sounds obvious, but plenty of agent systems skip it, so they either finish in one shot or leave your codebase in a half-broken state.
The exchange below is illustrative rather than a literal documented command, but it shows the shape of how a resume works:
# Task fails on sub-task 3
claude "migrate REST to GraphQL"
# [... sub-tasks 1-2 complete, sub-task 3 fails ...]
# Error: Resolver for /billing/invoices conflicts with existing middleware
# Fix the issue, resume from sub-task 3
claude "continue from sub-task 3: handle the middleware conflict"
# Task System resumes with full context of completed sub-tasks 1-2There's a related trick for teams. Set the CLAUDE_CODE_TASK_LIST_ID environment variable and you can point several Claude instances at the same task list, which is how cross-session coordination and team collaboration are meant to work (anthropics/claude-code Issue #23816).
The Unknown Complexity Handler
Here the description runs ahead of what Anthropic has actually published, so treat it as a way of thinking rather than a named, shipped feature. The idea is that when the system meets a sub-task it can't size up, it reportedly flags it as "unknown complexity" and switches into a research mode: instead of writing code, it explores the codebase, reads the docs, and produces a findings report. A human reads that, gives direction, and the system turns the findings into a proper plan.
Anthropic does document related behaviour, Plan Mode's "explore first" approach, effort levels, extended thinking, and subagent investigation (Claude Code Best Practices), but a discrete "Unknown Complexity Handler" with an automatic research mode isn't something they describe by name, so the construct above is best read as a model of the philosophy.
And that philosophy is the point. A hyped agent guesses and ships code. The cautious version admits it doesn't know and asks. You get slower starts in exchange for far fewer rollbacks, which is usually the trade a real team wants.
Integration with Plan Mode
The Task System and Plan Mode (covered in article 5) are meant to work side by side. Both are real: Plan Mode is a documented explore, plan, implement, commit workflow, and Tasks and subagents are genuine execution primitives (Claude Code Best Practices). The clean hand-off described here, where Plan Mode produces the high-level decomposition and the Task System executes each sub-task with persistence and recovery, is a useful mental model rather than a formal architecture Anthropic publishes. In practice, for a complex migration Plan Mode might sketch a 15-step plan, and the Task System works through each step, branching sub-tasks where it needs to and reporting progress back.
Realistic Expectations
The Task System doesn't replace senior engineers. It supports them. It takes the mechanical work, boilerplate, test scaffolding, doc updates, and pushes the judgement calls back to a person. The hierarchy is what makes that happen: ambiguous work escalates to a human instead of being guessed at. That positioning lines up with Anthropic's own research, which found that the more domain expertise someone brings, the more work Claude does per instruction, leaving human judgement at the centre (Anthropic Research).
One number in the original framing should be treated with caution. A claim that the Task System completes 78% of sub-tasks autonomously on a typical codebase, with the other 22% needing human input, is presented as an Anthropic benchmark, but no Anthropic publication or third-party report contains that figure (Anthropic Research). Read it as an illustration of the intended balance, high enough to save real time, low enough to avoid silent failures, not as a verified statistic.
The Task System isn't exciting. It's reliable. In agentic coding, reliability is the feature that actually earns its keep.


