Back to news

Code

Multi-Model Routing: Using the Right Model for Each Task.

Opus 4.8 for architecture, Sonnet 4.8 for coding, Haiku 4.8 for classification. How to implement task-based, complexity-based, and cascading routing strategies with real cost savings.

AI Kick Start editorial image for Multi-Model Routing: Using the Right Model for Each Task.

Decision

Start narrow

Use the article to decide the smallest useful workflow worth testing before expanding the system.

Risk to watch

Hype drift

Avoid turning a practical adoption step into a broad transformation promise nobody can verify.

Proof to collect

Business signal

Write down the owner, data boundary, review point, and measurable outcome before the first build.

TL;DR

TL;DR: Opus 4.8 for architecture, Sonnet 4.8 for coding, Haiku 4.8 for classification. How to implement task-based, complexity-based, and cascading routing strategies with real cost savings.

Key takeaways

  • Briefing: Not every task needs your most powerful model.
  • The Model Selection Matrix: Different tasks need different capabilities.
  • Routing Strategies: Strategy 1: Task-Based Routing Route based on what kind of task it is: def select_model(task_description: str) -> str: task_type = classify_task(task_description) routing_map = { "classification": "haiku-4.8", "code_generation": "sonnet-4.8", "refactoring": "sonnet-4.8", "architecture": "opus-4.8", "debugging": "sonnet-4.8", "documentation": "haiku-4.8", "security_review": "opus-4.8", } return routing_map.get(task_type, "sonnet-4.8") Strategy 2: Complexity-Based Routing Estimate complexity and route on that: def select_model_by_complexity(files_touched: int, lines_changed: int, task: str) -> str: complexity_score = ( files_touched * 10 + lines_changed * 0.1 + keyword_weight(task) ) if complexity_score > 100: return "opus-4.8" elif complexity_score > 30: return "sonnet-4.8" else: return "haiku-4.8" Strategy 3: Cascading Routing Start cheap and escalate only if you have to: def cascading_route(prompt: str) -> str: # Try Haiku first for simple tasks result = generate(prompt, model="haiku-4.8") if result.confidence < 0.8: result = generate(prompt, model="sonnet-4.8") if result.confidence < 0.8: result = generate(prompt, model="opus-4.8") return result This keeps the cost down on easy work while still landing the quality on hard work.
  • OpenHuman's Automatic Routing: [OpenHuman's single-subscription multi-model routing](https://tinyhumans.gitbook.io/openhuman/features/model-routing) is the easiest version to live with.
  • Measuring Routing Effectiveness: Track these: **Cost per task**: should fall once routing is on **Quality degradation**: should stay small (the author suggests keeping any success-rate change under 5%) **Escalation rate**: share of tasks that need a bigger model (the author's target is under 20%) **Latency**: average time to completion **Savings rate**: (single-model cost minus routed cost) divided by single-model cost The escalation and quality targets above are the author's recommended heuristics, not measured results from a cited study.

Briefing

Not every task needs your most powerful model. Multi-model routing, which means using the right model for each task, is the single most effective cost lever in agentic coding. OpenHuman does this automatically under its single subscription. Hermes supports it through OpenRouter. Claude Code supports it via sub-agent model selection. Here is how to put it into practice.

The Model Selection Matrix

Different tasks need different capabilities. The table below maps common coding jobs to a sensible model tier and rough cost band. A note of caution before you read it: the specific model names and per-token prices here have not held up to checking. The "4.8" Haiku and Sonnet versions in particular do not exist as released Anthropic models (the latest as of mid-2026 are Haiku 4.5 and Sonnet 4.6), and the prices listed are well off current rates. Treat the matrix as a way of thinking about tiers, not as a price sheet.

Task TypeRecommended ModelWhyCost/1M out
Simple classificationHaiku 4.8 / GPT-4.1-miniFast, cheap, sufficient$1.25-$0.60
Code completionSonnet 4.8 / GPT-4.1Good balance of quality and speed$10.00-$15.00
Complex refactoringOpus 4.8Deep reasoning, large context$75.00
Architecture designOpus 4.8 / GPT-4.1Best reasoning capabilities$10.00-$75.00
Test generationSonnet 4.8Good pattern recognition$15.00
DocumentationSonnet 4.8 / Haiku 4.8Language generation$1.25-$15.00
Security reviewOpus 4.8Needs deep analysis$75.00
DebuggingSonnet 4.8Good at pattern matching$15.00

For reference, the released top-tier model is Claude Opus 4.8, which current sources put at roughly $25 per 1M output tokens rather than the $75 shown above. The principle stands regardless of the exact numbers: reserve your dearest, slowest model for the jobs that actually need deep reasoning, and push everything else down to cheaper tiers.

Routing Strategies

Strategy 1: Task-Based Routing

Route based on what kind of task it is:

def select_model(task_description: str) -> str:
    task_type = classify_task(task_description)
    routing_map = {
        "classification": "haiku-4.8",
        "code_generation": "sonnet-4.8",
        "refactoring": "sonnet-4.8",
        "architecture": "opus-4.8",
        "debugging": "sonnet-4.8",
        "documentation": "haiku-4.8",
        "security_review": "opus-4.8",
    }
    return routing_map.get(task_type, "sonnet-4.8")

Strategy 2: Complexity-Based Routing

Estimate complexity and route on that:

def select_model_by_complexity(files_touched: int, lines_changed: int, task: str) -> str:
    complexity_score = (
        files_touched * 10 +
        lines_changed * 0.1 +
        keyword_weight(task)
    )
    if complexity_score > 100:
        return "opus-4.8"
    elif complexity_score > 30:
        return "sonnet-4.8"
    else:
        return "haiku-4.8"

Strategy 3: Cascading Routing

Start cheap and escalate only if you have to:

def cascading_route(prompt: str) -> str:
    # Try Haiku first for simple tasks
    result = generate(prompt, model="haiku-4.8")
    if result.confidence < 0.8:
        result = generate(prompt, model="sonnet-4.8")
    if result.confidence < 0.8:
        result = generate(prompt, model="opus-4.8")
    return result

This keeps the cost down on easy work while still landing the quality on hard work. The author's rule of thumb is that escalation adds one or two extra calls for the tasks that need it, though that is a planning heuristic rather than a measured figure.

Strategy 4: Provider-Based Routing

Hermes' OpenRouter integration lets you route at the provider level. The snippet below is illustrative, not a verified API, but it shows the shape of the idea:

# Route based on provider strengths
hermes.config.set("routing.providers", {
    "anthropic": { "priority": 1, "models": ["opus-4.8", "sonnet-4.8"] },
    "openai": { "priority": 2, "models": ["gpt-4.1"] },
    "nous": { "priority": 3, "models": ["hermes-3"] },
    "z.ai": { "priority": 4, "models": ["glm-4-plus"] },
    "kimi": { "priority": 5, "models": ["kimi-k2"] },
})

OpenRouter's catalogue gives you plenty of flexibility (it listed 200+ models a while back and now carries well over 400). The hard part is choosing. Too many options just create decision paralysis. Most teams are better off starting with three to five models that cover small, medium and large task sizes.

OpenHuman's Automatic Routing

OpenHuman's single-subscription multi-model routing is the easiest version to live with. It picks the cheapest adequate model for each task without you having to think about it. Local models handle simple classification, cloud models handle harder synthesis, and the premium models stay in reserve for jobs that genuinely warrant them.

The routing logic weighs up:

  • Task type (classification, generation, reasoning)
  • Context size (small context goes to a smaller model)
  • Required quality (judged on past success rates)
  • Current provider latency and availability
  • Cost per token

OpenHuman's docs frame routing as a way to cut cost and latency. The article's "40-60% cost savings versus using a single large model" figure is presented as user-reported, but it is not independently confirmed, so read it as illustrative rather than a benchmark.

Measuring Routing Effectiveness

Track these:

  • Cost per task: should fall once routing is on
  • Quality degradation: should stay small (the author suggests keeping any success-rate change under 5%)
  • Escalation rate: share of tasks that need a bigger model (the author's target is under 20%)
  • Latency: average time to completion
  • Savings rate: (single-model cost minus routed cost) divided by single-model cost

The escalation and quality targets above are the author's recommended heuristics, not measured results from a cited study.

Multi-model routing is not really about using cheaper models. It is about using the right model for each task. The savings are what falls out of choosing well.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick the smallest useful workflow that proves the pattern.
  2. Write down the owner, data boundary, review point, and success measure.
  3. Review the result after the first real run and decide whether to scale, change, or stop.

Want help applying this? Explore AI agent design systems.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Multi-Model Routing: Using the Right Model for Each Task

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call