AI News

OpenRouter's State of AI: Token Trends and the Quiet Model Shift Underway.

OpenRouter's usage data reveals a market in transition. We analyse the token flow trends that show which models developers are actually using, and which they are abandoning.

Daniel Fleuren2026-06-1911 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for OpenRouter's State of AI: Token Trends and the Quiet Model Shift Underway.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: A reported OpenRouter usage snapshot from mid-2026 points to a clear change in how developers pick AI models. Cheaper models, several of them from Chinese labs, are reportedly taking share fast; GPT-class premium models are slipping; and context window size has become a top reason to choose one model over another. The broad direction is well documented even where the exact percentages are not: the model market is starting to behave like a commodity, which is a harder place to sell a premium product.

Key takeaways

Budget models grew from 18% to 41% of OpenRouter request volume in H1 2026 (reported figure, unconfirmed; directionally backed by [a16z's State of AI study](https://a16z.com/state-of-ai/) showing Chinese-origin models rising from under 2% to ~45% of token share within a year)
Premium models declined from 35% to 22% share over the same period (reported figure, unconfirmed; premium/Anthropic share decline is broadly reported)
Long-context requests (128K+ tokens) grew from 8% to 27% of total requests (reported figure, unconfirmed; rising sequence length is documented in [OpenRouter's study](https://openrouter.ai/state-of-ai))
The average developer account now uses 3.2 different models regularly (reported figure, unconfirmed)

Analysis

If you want to know which AI models developers actually reach for, watching the marketing is a waste of time. Watch where the requests go.

OpenRouter is the plumbing for a big slice of that traffic. It sits between apps and dozens of model providers, and because switching models is a single line of code, the platform sees real choices play out in real time. The company's published State of AI study, run with a16z, looked at roughly 100 trillion tokens of usage. That kind of data is closer to a market signal than any vendor benchmark.

The story it tells is awkward for the expensive end of the market. Budget models are reportedly soaking up traffic, premium models are losing it, and the thing developers increasingly optimise for is not the last few points on a benchmark. It's price and how much context the model can hold at once. For an Australian team deciding where to spend an AI budget, that's the headline: the gap between "good enough" and "best in class" is narrowing, and the price gap is not.

A caution before the numbers. Several of the precise figures and one or two of the model names below come from a reported mid-2026 OpenRouter snapshot that we could not confirm against the company's own publications. We've flagged those as reported rather than established. The overall pattern, though, holds up across independent reporting.

The Rise of Budget Models

The clearest move is toward cheap models. By the reported mid-2026 snapshot, models priced under $1 per million input tokens had grown from about 18% of OpenRouter's request volume in January to roughly 41% by June. (These exact share figures are reported and unconfirmed.) The named winners in that account included DeepSeek (reported as "V3.5", a version that does not actually exist, DeepSeek's real 2026 line runs V3.2 then the V4 family), Gemini 3.5 Flash, and MiniMax M3, released on 1 June 2026 with a 1M-token context window.

Worth a correction here: Gemini 3.5 Flash is real, but it isn't actually a sub-$1 model. It's priced at $1.50 per million input and $9 per million output, so grouping it with the under-$1 tier is wrong.

The economics behind the shift are simple. As models converge on capability, the premium for a marginal improvement gets harder to justify. A team building a content moderation pipeline cares about accuracy and cost, not whether a model scores 86% or 82% on MMLU-Pro. When a budget model does the job at a fraction of the price of a flagship, the decision makes itself.

Supporting AI Kick Start editorial image for openrouter-state-of-ai-token-trends-model-shifts. — Generated AI Kick Start editorial visual used to explain the article's practical workflow and trade-offs.

The Decline of Premium Models

The reported snapshot shows the mirror image at the top. Premium models, priced above $5 per million input tokens, fell from a reported 35% of volume to about 22%. GPT-5.5, a real model OpenAI shipped on 23 April 2026, reportedly dropped from 22% to 14%. Claude Opus 4.8, released 28 May 2026 and pitched as a coding leader, reportedly held near 6%. (All three share figures are unconfirmed.)

One detail in that account is plainly wrong: GPT-5.5 Pro was described at "$8/$40", but OpenAI's actual pricing is $30 per million input and $180 per million output. Its reported slide from 4% to 2% share is also unconfirmed. The ultra-premium tier looks like a niche either way.

The odd one out is Claude Fable 5. The underlying event is real: Anthropic launched Fable 5 (and Mythos 5) on 9 June 2026 and suspended access on 12 June after a US government directive. It was reportedly pulling around 3% of share in that brief window, though that figure is unconfirmed. The demand for a top-capability premium model was there. The supply got cut off.

Context as the Key Selection Criterion

After price, the reported snapshot puts context window size as the next biggest factor in picking a model. Requests asking for more than 128K tokens of context reportedly grew from 8% of the total in January to 27% in June. (Unconfirmed figures, though OpenRouter's published study does document rising average sequence length.) Models with million-token contexts get picked for these jobs even when their per-token price is higher.

That tracks with how the work is splitting. Short-context tasks, quick answers, simple text generation, increasingly go to budget models or smaller specialised systems. What's left for the frontier models is the work that genuinely needs the long context: reading whole codebases, reviewing stacks of documents, reasoning across a large body of information at once.

The Switching Dynamic

Because switching models on OpenRouter is one parameter change, developers do it constantly. The reported snapshot puts the average account at 3.2 models in regular use, up from 1.8 in early 2025. (Unconfirmed.) That's commoditisation in action: when models are close to interchangeable, you use a different one for each job and optimise cost and capability per request.

The same account reports low loyalty, of developers who had GPT-5.5 as their primary model in January 2026, only 38% still did by June, with the rest moving to cheaper options, more capable ones, or juggling several. Treat that one with real skepticism: GPT-5.5 didn't launch until 23 April 2026, so nobody could have had it as a primary model in January. The retention breakdown appears to be invented.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

OpenRouter documentation

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: OpenRouter's State of AI: Token Trends and the Quiet Model Shift Underway

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call