Analysis
If you want to know which AI models developers actually reach for, watching the marketing is a waste of time. Watch where the requests go.
OpenRouter is the plumbing for a big slice of that traffic. It sits between apps and dozens of model providers, and because switching models is a single line of code, the platform sees real choices play out in real time. The company's published State of AI study, run with a16z, looked at roughly 100 trillion tokens of usage. That kind of data is closer to a market signal than any vendor benchmark.
The story it tells is awkward for the expensive end of the market. Budget models are reportedly soaking up traffic, premium models are losing it, and the thing developers increasingly optimise for is not the last few points on a benchmark. It's price and how much context the model can hold at once. For an Australian team deciding where to spend an AI budget, that's the headline: the gap between "good enough" and "best in class" is narrowing, and the price gap is not.
A caution before the numbers. Several of the precise figures and one or two of the model names below come from a reported mid-2026 OpenRouter snapshot that we could not confirm against the company's own publications. We've flagged those as reported rather than established. The overall pattern, though, holds up across independent reporting.
The Rise of Budget Models
The clearest move is toward cheap models. By the reported mid-2026 snapshot, models priced under $1 per million input tokens had grown from about 18% of OpenRouter's request volume in January to roughly 41% by June. (These exact share figures are reported and unconfirmed.) The named winners in that account included DeepSeek (reported as "V3.5", a version that does not actually exist, DeepSeek's real 2026 line runs V3.2 then the V4 family), Gemini 3.5 Flash, and MiniMax M3, released on 1 June 2026 with a 1M-token context window.
Worth a correction here: Gemini 3.5 Flash is real, but it isn't actually a sub-$1 model. It's priced at $1.50 per million input and $9 per million output, so grouping it with the under-$1 tier is wrong.
The economics behind the shift are simple. As models converge on capability, the premium for a marginal improvement gets harder to justify. A team building a content moderation pipeline cares about accuracy and cost, not whether a model scores 86% or 82% on MMLU-Pro. When a budget model does the job at a fraction of the price of a flagship, the decision makes itself.

Context as the Key Selection Criterion
After price, the reported snapshot puts context window size as the next biggest factor in picking a model. Requests asking for more than 128K tokens of context reportedly grew from 8% of the total in January to 27% in June. (Unconfirmed figures, though OpenRouter's published study does document rising average sequence length.) Models with million-token contexts get picked for these jobs even when their per-token price is higher.
That tracks with how the work is splitting. Short-context tasks, quick answers, simple text generation, increasingly go to budget models or smaller specialised systems. What's left for the frontier models is the work that genuinely needs the long context: reading whole codebases, reviewing stacks of documents, reasoning across a large body of information at once.
The Switching Dynamic
Because switching models on OpenRouter is one parameter change, developers do it constantly. The reported snapshot puts the average account at 3.2 models in regular use, up from 1.8 in early 2025. (Unconfirmed.) That's commoditisation in action: when models are close to interchangeable, you use a different one for each job and optimise cost and capability per request.
The same account reports low loyalty, of developers who had GPT-5.5 as their primary model in January 2026, only 38% still did by June, with the rest moving to cheaper options, more capable ones, or juggling several. Treat that one with real skepticism: GPT-5.5 didn't launch until 23 April 2026, so nobody could have had it as a primary model in January. The retention breakdown appears to be invented.


