Briefing
Not every task needs your most powerful model. Multi-model routing, which means using the right model for each task, is the single most effective cost lever in agentic coding. OpenHuman does this automatically under its single subscription. Hermes supports it through OpenRouter. Claude Code supports it via sub-agent model selection. Here is how to put it into practice.
The Model Selection Matrix
Different tasks need different capabilities. The table below maps common coding jobs to a sensible model tier and rough cost band. A note of caution before you read it: the specific model names and per-token prices here have not held up to checking. The "4.8" Haiku and Sonnet versions in particular do not exist as released Anthropic models (the latest as of mid-2026 are Haiku 4.5 and Sonnet 4.6), and the prices listed are well off current rates. Treat the matrix as a way of thinking about tiers, not as a price sheet.
| Task Type | Recommended Model | Why | Cost/1M out |
|---|---|---|---|
| Simple classification | Haiku 4.8 / GPT-4.1-mini | Fast, cheap, sufficient | $1.25-$0.60 |
| Code completion | Sonnet 4.8 / GPT-4.1 | Good balance of quality and speed | $10.00-$15.00 |
| Complex refactoring | Opus 4.8 | Deep reasoning, large context | $75.00 |
| Architecture design | Opus 4.8 / GPT-4.1 | Best reasoning capabilities | $10.00-$75.00 |
| Test generation | Sonnet 4.8 | Good pattern recognition | $15.00 |
| Documentation | Sonnet 4.8 / Haiku 4.8 | Language generation | $1.25-$15.00 |
| Security review | Opus 4.8 | Needs deep analysis | $75.00 |
| Debugging | Sonnet 4.8 | Good at pattern matching | $15.00 |
For reference, the released top-tier model is Claude Opus 4.8, which current sources put at roughly $25 per 1M output tokens rather than the $75 shown above. The principle stands regardless of the exact numbers: reserve your dearest, slowest model for the jobs that actually need deep reasoning, and push everything else down to cheaper tiers.
Routing Strategies
Strategy 1: Task-Based Routing
Route based on what kind of task it is:
def select_model(task_description: str) -> str:
task_type = classify_task(task_description)
routing_map = {
"classification": "haiku-4.8",
"code_generation": "sonnet-4.8",
"refactoring": "sonnet-4.8",
"architecture": "opus-4.8",
"debugging": "sonnet-4.8",
"documentation": "haiku-4.8",
"security_review": "opus-4.8",
}
return routing_map.get(task_type, "sonnet-4.8")Strategy 2: Complexity-Based Routing
Estimate complexity and route on that:
def select_model_by_complexity(files_touched: int, lines_changed: int, task: str) -> str:
complexity_score = (
files_touched * 10 +
lines_changed * 0.1 +
keyword_weight(task)
)
if complexity_score > 100:
return "opus-4.8"
elif complexity_score > 30:
return "sonnet-4.8"
else:
return "haiku-4.8"Strategy 3: Cascading Routing
Start cheap and escalate only if you have to:
def cascading_route(prompt: str) -> str:
# Try Haiku first for simple tasks
result = generate(prompt, model="haiku-4.8")
if result.confidence < 0.8:
result = generate(prompt, model="sonnet-4.8")
if result.confidence < 0.8:
result = generate(prompt, model="opus-4.8")
return resultThis keeps the cost down on easy work while still landing the quality on hard work. The author's rule of thumb is that escalation adds one or two extra calls for the tasks that need it, though that is a planning heuristic rather than a measured figure.
Strategy 4: Provider-Based Routing
Hermes' OpenRouter integration lets you route at the provider level. The snippet below is illustrative, not a verified API, but it shows the shape of the idea:
# Route based on provider strengths
hermes.config.set("routing.providers", {
"anthropic": { "priority": 1, "models": ["opus-4.8", "sonnet-4.8"] },
"openai": { "priority": 2, "models": ["gpt-4.1"] },
"nous": { "priority": 3, "models": ["hermes-3"] },
"z.ai": { "priority": 4, "models": ["glm-4-plus"] },
"kimi": { "priority": 5, "models": ["kimi-k2"] },
})OpenRouter's catalogue gives you plenty of flexibility (it listed 200+ models a while back and now carries well over 400). The hard part is choosing. Too many options just create decision paralysis. Most teams are better off starting with three to five models that cover small, medium and large task sizes.
OpenHuman's Automatic Routing
OpenHuman's single-subscription multi-model routing is the easiest version to live with. It picks the cheapest adequate model for each task without you having to think about it. Local models handle simple classification, cloud models handle harder synthesis, and the premium models stay in reserve for jobs that genuinely warrant them.
The routing logic weighs up:
- Task type (classification, generation, reasoning)
- Context size (small context goes to a smaller model)
- Required quality (judged on past success rates)
- Current provider latency and availability
- Cost per token
OpenHuman's docs frame routing as a way to cut cost and latency. The article's "40-60% cost savings versus using a single large model" figure is presented as user-reported, but it is not independently confirmed, so read it as illustrative rather than a benchmark.
Measuring Routing Effectiveness
Track these:
- Cost per task: should fall once routing is on
- Quality degradation: should stay small (the author suggests keeping any success-rate change under 5%)
- Escalation rate: share of tasks that need a bigger model (the author's target is under 20%)
- Latency: average time to completion
- Savings rate: (single-model cost minus routed cost) divided by single-model cost
The escalation and quality targets above are the author's recommended heuristics, not measured results from a cited study.
Multi-model routing is not really about using cheaper models. It is about using the right model for each task. The savings are what falls out of choosing well.




