Briefing
Agentic coding is not free. Tokens cost money, compute costs money, and the engineer time spent babysitting agents costs money too. This piece pulls apart what it actually costs to run agents in production, as of June 2026.
Here is the part most vendors skip over. When a team first switches on an AI coding agent, the bill they expect (the API charges) is rarely the bill that hurts. The visible cost is the model usage. The cost that quietly eats your month is the senior engineer who spends two afternoons writing system prompts, then another reviewing what the agent shipped. None of that shows up on an invoice, which is exactly why it catches people out.
A quick warning before the numbers. Model pricing in this space moves fast, and a few of the figures below come from product pages and third-party write-ups rather than locked, official rate cards. Where the published price differs from what we could confirm, I have flagged it inline. Treat the dollar amounts as a planning starting point, not a quote.
With that out of the way, here is where the money actually goes.
Cost Categories
Agent costs land in four buckets.
1. Model API Costs
This is the most visible cost, and the one that swings the most. It comes down to which model you pick, how many tokens you burn, and which provider you go through:
| Model | Provider | Input/1M tokens | Output/1M tokens |
|---|---|---|---|
| Opus 4.8 | Anthropic | $15.00 | $75.00 |
| Sonnet 4.8 | Anthropic | $3.00 | $15.00 |
| Haiku 4.8 | Anthropic | $0.25 | $1.25 |
| GPT-4.1 | OpenAI | $2.50 | $10.00 |
| GPT-4.1-mini | OpenAI | $0.15 | $0.60 |
| Hermes 3 | Nous Portal | $1.00 | $3.00 |
| Via OpenRouter | Various | $0.10-15.00 | $0.50-75.00 |
A few of those rows need a correction. The Opus 4.8 figure above ($15/$75) looks to be an older Opus-era price. Current sources put Claude Opus 4.8 at roughly $5 input / $25 output per 1M tokens, with a faster mode around $10/$50. That matters, because every per-task and total-cost figure further down this article is built on the higher number, so read the Opus-based dollar amounts as roughly three times what you would actually pay at the current rate.
The Sonnet 4.8 price of $3 input / $15 output is accurate. Haiku is murkier: the $0.25/$1.25 rate matches an earlier Haiku generation, and the current published Haiku tier is referenced as Haiku 4.5 at about $1/$5, so treat the "Haiku 4.8" line as unconfirmed.
The OpenAI rows have the same issue. GPT-4.1 currently runs closer to $2 input / $8 output, not $2.50/$10. And the $0.15/$0.60 line labelled GPT-4.1-mini actually matches GPT-4.1 Nano pricing; the Mini tier sits nearer $0.40/$1.60. The full OpenAI rate card is the place to confirm before you budget.
Hermes 3 sits inside the Nous Portal subscription, which is a real product bundling 300-plus models, though the specific $1/$3 per-million-token rate is not confirmed on the official pricing page and should be read as indicative. The OpenRouter range is a broad illustration of how wide the aggregator's pricing band gets, not a single quotable price.
So what does a real task cost? A genuinely complex coding job (Plan Mode, several files, sub-agents) chews through somewhere around 50K-200K input tokens and 20K-80K output tokens on Opus 4.8. At the article's original Opus price that works out to $1.50-$12.00 per task. At the corrected $5/$25 rate, it is closer to a third of that. These token volumes are a working estimate from typical usage, not a measured benchmark, so your mileage will vary with how much context you load.
Run the same task on Sonnet 4.8 and you are looking at $0.30-$2.40. Hand the routine parts to a Haiku sub-agent and it drops again, to $0.05-$0.50.
2. Infrastructure Costs
| Setup | Monthly Cost | Notes |
|---|---|---|
| Hermes on VPS (2 vCPU/4GB) | ~$5 | Hetzner, DigitalOcean |
| OpenClaw self-hosted | $0 | Runs on existing infrastructure |
| OpenClaw managed (DigitalOcean) | $24 | Includes support |
| OpenHuman subscription | ~$20 | Multi-model routing included |
| Claude Code team | $100 | Per team, not per user |
| Cursor Pro | $20 | Per user |
| GitHub Copilot Business | $19 | Per user |
A note on a few of these. Hermes Agent is free and open-source, so your only cost is inference plus a small box to run it on, and a 2 vCPU/4GB VPS on Hetzner or DigitalOcean really does land around $5/month. OpenClaw's self-hosted core is open-source too, so the software itself is genuinely $0; you bring your own API key. DigitalOcean does offer a one-click OpenClaw deploy, and the total runs roughly $5-45/month depending on the droplet, so the "$24 including support" line is plausible but not an official managed-plan price. The OpenHuman product is real and open-source, but the ~$20/month subscription tier with multi-model routing was not confirmed on an official pricing page, so treat that figure as reported rather than fixed.
One correction worth flagging: the "Claude Code team" row reads $100 per team, but the actual pricing is $100 per seat per month on the annual Premium plan (or $125 monthly), with a five-seat minimum. That is per user, not a flat rate for the whole team, which changes the maths a lot for a five-person crew. Cursor Pro at $20/user and GitHub Copilot Business at $19/user both check out.
3. Engineer Time
This is the hidden cost, and usually the big one. Setting agents up, writing the system prompts, keeping the harness running, and reviewing what the agent produces all eat hours:
| Activity | Time (initial) | Time (ongoing/month) |
|---|---|---|
| Initial setup | 4-16 hours | - |
| Prompt engineering | 4-8 hours | 2-4 hours |
| Harness maintenance | 2-4 hours | 4-8 hours |
| Output review | 30-50% of agent output time | 20-30% after calibration |
| Debugging agent failures | 2-6 hours | 1-3 hours |
Put a senior engineer's rate at $150/hour and the first month of that work runs $2,100-$5,100, settling to $1,050-$2,550/month after that. Worth saying plainly: these are modelled estimates, not figures pulled from a published study, so use them to sanity-check your own numbers rather than as gospel.
4. Error and Rework Costs
Agents get things wrong. When they do, the bill includes:
- Rollback time: undoing bad changes, 15-60 minutes per incident
- Debug time: tracking down the root cause, anywhere from 30 minutes to 4 hours
- Opportunity cost: the higher-value work that did not get done
- Production incidents: the nightmare case, possibly hours of downtime
A well-set-up agent keeps its rollback rate under 5%. A badly set-up one can blow past 20%. Both of those rates are author estimates rather than published benchmarks, but the gap between a good harness and a sloppy one is real, and it is where a lot of the unhappy surprises come from.
Total Cost of Ownership: Example
Take a 5-person engineering team running the 3-agent stack (the one covered in article 12 of this series):
| Category | Monthly Cost |
|---|---|
| Model API (mixed usage) | $200-$500 |
| Hermes VPS | $5 |
| OpenClaw managed | $24 |
| OpenHuman (5 subscriptions) | $100 |
| Engineer time (harness maint) | $1,500 |
| Error/rework (5% rollback) | $300 |
| Total | $2,129-$2,429 |
Now the same team with no agents at all:
| Category | Monthly Cost |
|---|---|
| Engineer time (manual work) | +40% more |
| Context switching overhead | Immeasurable but real |
| Knowledge transfer time | Higher |
| Total (imputed) | $3,500-$4,500 |
That nets out to roughly $1,000-$2,500/month saved for a five-person team, with the return turning positive somewhere in month 2-3 once the team is past the learning curve. Two caveats. First, these are the author's projections, not externally measured results. Second, they lean on the model prices above, so if you redo the API line at the corrected Opus rate the savings get better, not worse. And remember the Claude Code seat pricing correction: if you are paying per seat rather than a flat $100, your subscription line is higher than the table shows.
Cost Optimisation Strategies
- Use the right model for each task: Haiku for the simple stuff, Sonnet for standard work, Opus only when the architecture is genuinely hard
- Cache context: Hermes' FTS5 session search recalls past conversations so you are not reloading the same context every time
- Batch related tasks: one big context load is cheaper than a string of small ones
- Watch your output format: asking for full file rewrites burns far more tokens than asking for diffs
- Lean on prompt caching: Anthropic's prompt caching can cut cached input costs by up to ~90%, and Claude Code uses it (it is context caching, to be precise, not a guaranteed identical-response cache)
- Monitor usage: set budgets and alerts, because token costs can spike without warning
Agentic coding costs real money. Done properly, though, it costs less than the alternative of not doing it.




