Model pricing wars: June 2026 comparison table
Analysis
By the middle of 2026, the question that decides most AI buying calls isn't "which model is smartest." It's "which one is cheap enough to run all day without anyone wincing at the bill."
That shift happened fast. A couple of years ago the top models were genuinely far apart on quality, and you paid up for the best one because there wasn't a close substitute. Now the gap between a frontier model and a solid mid-tier one is small enough that, for a lot of everyday work, the cheaper option just does the job. So vendors compete on the one lever left: price.
The numbers have moved a long way. A model that would have counted as frontier-grade in 2024 now runs for less than a dollar per million tokens. For a business team, that's the headline: the floor has dropped, and most of what you want to do sits comfortably above it.
One caution before the table. AI pricing changes weekly, vendors run promo rates, and "the same model" can mean different SKUs at different prices. We checked these figures against public pricing trackers in June 2026. Some line up exactly. Several don't, and we've said so directly rather than passing them off as gospel.
Complete pricing table
| Model | Input / 1M | Output / 1M | Combined* | SWE-bench | MMLU | Context |
|---|---|---|---|---|---|---|
| Llama 4 | Free | Free | Free | 50.2% | 84.8% | 256K |
| DeepSeek V3.5 | $0.15 | $0.60 | $1.35 | 52.4% | 85.8% | 1M |
| Gemini 3.5 Flash | $0.35 | $0.70 | $1.75 | 48.2% | 86.8% | 1M |
| Qwen 3 | $0.40 | $1.20 | $2.80 | 46.2% | 84.6% | 128K |
| GPT-5.5 Instant | $0.50 | $1.50 | $3.50 | 42.1% | 84.2% | 128K |
| MiniMax M3 | $0.30 | $1.20 | $2.70 | 59.0% | 86.4% | 1M |
| Kimi K2.7-Code | $0.50 | $2.00 | $5.00 | 56.8% | 85.7% | 256K |
| GLM-5.2 | $0.80 | $2.40 | $5.60 | 51.4% | 85.2% | 256K |
| Mistral Large 2 | $2.00 | $6.00 | $14.00 | 48.6% | 85.1% | 256K |
| Gemini 3.1 Pro | $3.50 | $10.50 | $24.50 | 54.2% | 88.1% | 1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $33.00 | 58.1% | 87.6% | 1M |
| Claude Opus 4.8 | $5.00 | $25.00 | $55.00 | 69.2% | 89.8% | 1M |
| Claude Opus 4.7 | $5.00 | $25.00 | $55.00 | 63.8% | 89.2% | 1M |
| Grok 4 | $5.00 | $25.00 | $55.00 | 54.8% | 87.2% | 256K |
| GPT-5.5 | $5.00 | $30.00 | $65.00 | 58.6% | 88.4% | 400K |
| GPT-5.5 Pro | $8.00 | $40.00 | $88.00 | 62.4% | 89.7% | 400K |
| Claude Fable 5 | $10.00 | $50.00 | $110.00 | 80.3% | 92.1% | 1M |
*Combined = 1M input + 2M output tokens (typical assistant workload). The "SWE-bench" column reflects SWE-bench Pro figures, not Verified, worth knowing before you compare these against scores you've seen elsewhere.
What checks out, and what doesn't
Before you build a budget on this, here's where the figures stand against public pricing as of June 2026:
- Confirmed: Claude Opus 4.8 at $5/$25 input/output (CloudZero), with a 69.2% SWE-bench Pro score (Morph LLM leaderboard). Claude Opus 4.7 at $5/$25 (Finout). Claude Sonnet 4.6 at $3/$15 with a 1M context. Claude Fable 5 at $10/$50 and 80.3% SWE-bench Pro. GPT-5.5 standard at $5/$30 (AI Pricing Guru). Mistral Large 2 at $2/$6 (AI Pricing Guru). MiniMax M3 at roughly $0.30/$1.20 on its promo rate, 59% SWE-bench Pro, 1M context (VentureBeat).
- Roughly right, with caveats: Opus 4.7's SWE-bench Pro is reported closer to 64.3% than 63.8% (Vellum). Qwen 3's $0.40/$1.20 matches the Qwen-Plus mid tier, not "Qwen 3" as one SKU (eesel AI). Llama 4 is genuinely free to run on your own hardware (you still pay for the GPUs).
- Wrong or unconfirmed, don't budget off these: GPT-5.5 Pro is listed at $8/$40, but actual pricing reportedly runs $30/$180 (PricePerToken). Gemini 3.5 Flash is shown at $0.35/$0.70; reported launch pricing is closer to $1.50/$9.00 (DevTk). Gemini 3.1 Pro is listed at $3.50/$10.50, with reported figures nearer $2.00/$12.00 (DevTk). Grok 4 is shown at $5/$25, reportedly closer to $3/$15 (PricePerToken). Kimi K2.7-Code is listed at $0.50/$2.00, reportedly $0.95/$4.00 (TokenCost). GLM-5.2 is shown at $0.80/$2.40 with 256K context; reported figures are $1.40/$4.40 with a 1M window (CloudPrice). GPT-5.5's context is listed as 400K but is reportedly nearer 1M (Skywork). A distinct "DeepSeek V3.5" and a separate "GPT-5.5 Instant" SKU at the prices shown could not be confirmed against current pricing pages.
Price-performance tiers
Read these tiers as the shape of the market, not as fixed quotes. The bands hold up even where individual cells don't.
Free tier: Llama 4. You pay for infrastructure, not tokens. Best if you're self-hosting on GPUs you already own.
Ultra-budget ($1-3): DeepSeek V3.5, Gemini 3.5 Flash, MiniMax M3. Capable models at very low list prices. On the figures shown, DeepSeek wins on input, Flash on output, and MiniMax on raw capability, though, as flagged above, the DeepSeek and Flash numbers here are the unconfirmed ones, so treat that ranking loosely.
Budget ($3-6): Qwen 3, GPT-5.5 Instant, Kimi K2.7-Code. Each has a lane. Qwen for multilingual work, the Instant tier for teams already in the OpenAI ecosystem, Kimi for coding.
Mid-range ($6-15): GLM-5.2, Mistral Large 2. Premium open-weight models with specific strengths, GLM leans on knowledge tasks, Mistral on European languages.
Premium ($15-35): Gemini 3.1 Pro, Sonnet 4.6. Strong closed models, both with 1M-token contexts.
Ultra-premium ($55+): Opus 4.8, Grok 4, GPT-5.5, GPT-5.5 Pro, Fable 5. Top capability, top price. You go here when the work warrants it, not by default.
The pricing trend
Prices are dropping faster than capabilities are climbing. A model scoring 85%+ on MMLU and 50%+ on SWE-bench Pro, frontier territory in 2024, now runs for under a dollar per million tokens. MiniMax M3 is the cleanest example: 86.4% MMLU, 59% SWE-bench Pro, and a promo rate in the $0.30-$1.20 range (VentureBeat). That kind of compression is what's pulling AI into everyday business workflows at scale.
Verdict
The price war is good news if you're buying. The headline gap looks enormous, the cheapest input rate in the table is a fraction of the priciest, but the capability gap is nowhere near that wide. (And the most extreme version of that comparison leans on the DeepSeek figure, which is one of the unconfirmed ones, so don't quote a precise multiple.)
The practical takeaway holds regardless: for most jobs, a model in the sub-$1.20 range will do the work. Keep the ultra-premium models for the tasks where a wrong answer is genuinely expensive, and before you commit a budget, check current vendor pricing yourself, because the figures move and a few in this table are off.


