The $1-per-million club: Cheapest capable models
Analysis
A year ago, running a million tokens through a decent model cost real money. Now the floor has dropped so far that "good enough" has stopped being expensive. That's the story underneath all the numbers: the cheap end of the market has caught up to where the frontier sat not long ago, and for a lot of everyday business work, you no longer need to pay flagship prices.
That's the genuine shift. The catch is the specifics. A list has been doing the rounds claiming exactly four models now sit under $1 per million input tokens, each with neat benchmark scores to match. When we went looking, most of those figures didn't line up with what the vendors actually publish. Some of the models don't appear to exist under the names given. So read what follows as a map of how to weigh cheap models against each other, not as a verified shopping list.
Here's why it matters for your team: if you're doing bulk document work, monitoring, classification, or first-draft generation, the cost difference between the cheap tier and a flagship model is large enough to change what's worth automating at all. The trick is matching the model to the job and confirming today's price yourself.
The contenders
| Model | Input Price | Output Price | SWE-bench Pro | MMLU | Context |
|---|---|---|---|---|---|
| DeepSeek V3.5 | $0.15 / 1M | $0.60 / 1M | 52.4% | 85.8% | 1M |
| Gemini 3.5 Flash | $0.35 / 1M | $0.70 / 1M | 48.2% | 86.8% | 1M |
| Qwen 3 | $0.40 / 1M | $1.20 / 1M | 46.2% | 84.6% | 128K |
| GPT-5.5 Instant | $0.50 / 1M | $1.50 / 1M | 42.1% | 84.2% | 128K |
A caution before you read the table as gospel: we couldn't confirm most of these figures, and a few clash with what the vendors publish. There's no "DeepSeek V3.5" in DeepSeek's own change log, the documented line runs V3, V3.1, V3.2 and V4, with V3.2 priced nearer $0.28 input / $0.42 output on a roughly 131K context. Gemini 3.5 Flash is real, but reported I/O 2026 pricing lands closer to $1.50 input, which would put it above the sub-$1 line, not under it. And while GPT-5.5 shipped in April 2026, its standard API pricing is reported around $5 / $30 per million on a 1M context, nowhere near $0.50 / $1.50. The SWE-bench Pro scores below are also unconfirmed and sit lower than the public leaderboard numbers we'd expect for named flagships. So take the ranking as reasoning about value, and verify the live numbers before you spend anything.
Ranking by value
1. DeepSeek V3.5, Best overall value. On the figures quoted ($0.15/$0.60, 1M context, 52.4% SWE-bench Pro, 85.8% MMLU), this would be the cheapest capable option with the longest context, and the open licence sweetens it further. Worth flagging: a model under exactly that name and price doesn't appear in DeepSeek's docs, so the real-world equivalent is more likely V3.2 or V4. Either way, for bulk document processing, monitoring and analysis, DeepSeek's cheap tier is hard to beat on cost per token.
2. Gemini 3.5 Flash, Best balance. The pitch is a small premium over DeepSeek in exchange for a higher MMLU (86.8%) and Google's production reliability. The $0.35/$0.70 pricing quoted here is the part to double-check, reported figures are several times higher, which would knock it out of the sub-$1 club entirely. If the cheap price holds where you are, Flash is a sensible default for production. If it doesn't, the reliability argument still stands, just at a higher cost.
3. Qwen 3, Best for multilingual. At a quoted $0.40/$1.20, Qwen 3 reads as the cheapest route for Asian-language work. The exact SKU and the 128K context are unconfirmed, the current Qwen line tends to ship with much larger windows, but the multilingual strength is the real draw here. If your workload leans into non-English content, this is the one to trial.
4. GPT-5.5 Instant, Best ecosystem integration. Instant is pitched as the priciest and weakest of the four on paper, earning its place through tight integration with OpenAI's platform. The $0.50/$1.50, 128K-context SKU quoted here isn't one we could verify against OpenAI's published pricing, which runs far higher. If you're already inside OpenAI's tooling, it's the path of least friction, just don't assume the cheap price tag.
The price-performance curve
The headline point survives the messy details: the gap between cheap and capable has narrowed sharply. The quoted scores put all four above 42% on SWE-bench Pro and above 84% on MMLU. The MMLU range is broadly believable for capable 2026 models; the SWE-bench numbers we couldn't confirm. The MMLU framing is the safer one to lean on, that level of general knowledge would have read as frontier performance a couple of years back, and now it's showing up at budget prices. The cheapening of AI is real even if these particular figures aren't nailed down.
Verdict
If you want maximum value, start by trialling DeepSeek's cheapest current model, but check which SKU is actually live, since "V3.5" may not be it. Step up to Gemini 3.5 Flash if you need Google's infrastructure or a bit more general knowledge, and confirm the price first, because it may sit above the sub-$1 line. Reach for Qwen 3 for multilingual work, and pick GPT-5.5 Instant only if you're already committed to OpenAI's stack. Across all of them, the rule is the same: verify today's pricing on the vendor's own page before you build anything on top of it.



