Back to news

Model Review

Model pricing wars: June 2026 comparison table.

A complete price comparison of all 17 models. Input prices range from free (Llama 4) to $10/1M (Claude Fable 5). Output prices range from free to $50/1M. The full pricing landscape.

AI Kick Start editorial image for Model pricing wars: June 2026 comparison table.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: Model capabilities are bunching up, so price has turned into the thing buyers actually compare. This is a snapshot of where 17 models sit on cost versus benchmark scores, ranked by what a typical workload costs to run. A few numbers in the table below check out against vendor pricing; several others don't, and we've flagged them so you don't budget off bad figures.

Key takeaways

  • Capability differences between top and mid-tier models have narrowed, so cost is now the main thing buyers weigh.
  • Verified pricing in the table: Opus 4.8 ($5/$25), Opus 4.7 ($5/$25), Sonnet 4.6 ($3/$15), Fable 5 ($10/$50), GPT-5.5 standard ($5/$30), Mistral Large 2 ($2/$6), and MiniMax M3 (~$0.30/$1.20).
  • Several figures don't check out, GPT-5.5 Pro, Gemini 3.5 Flash, Gemini 3.1 Pro, Grok 4, Kimi K2.7-Code, and GLM-5.2 prices, plus a couple of context windows and two SKU names, so confirm with the vendor before budgeting.
  • For most workloads, a sub-$1.20 model is enough; save the ultra-premium tier for high-stakes tasks.

Model pricing wars: June 2026 comparison table

Analysis

By the middle of 2026, the question that decides most AI buying calls isn't "which model is smartest." It's "which one is cheap enough to run all day without anyone wincing at the bill."

That shift happened fast. A couple of years ago the top models were genuinely far apart on quality, and you paid up for the best one because there wasn't a close substitute. Now the gap between a frontier model and a solid mid-tier one is small enough that, for a lot of everyday work, the cheaper option just does the job. So vendors compete on the one lever left: price.

The numbers have moved a long way. A model that would have counted as frontier-grade in 2024 now runs for less than a dollar per million tokens. For a business team, that's the headline: the floor has dropped, and most of what you want to do sits comfortably above it.

One caution before the table. AI pricing changes weekly, vendors run promo rates, and "the same model" can mean different SKUs at different prices. We checked these figures against public pricing trackers in June 2026. Some line up exactly. Several don't, and we've said so directly rather than passing them off as gospel.

Complete pricing table

ModelInput / 1MOutput / 1MCombined*SWE-benchMMLUContext
Llama 4FreeFreeFree50.2%84.8%256K
DeepSeek V3.5$0.15$0.60$1.3552.4%85.8%1M
Gemini 3.5 Flash$0.35$0.70$1.7548.2%86.8%1M
Qwen 3$0.40$1.20$2.8046.2%84.6%128K
GPT-5.5 Instant$0.50$1.50$3.5042.1%84.2%128K
MiniMax M3$0.30$1.20$2.7059.0%86.4%1M
Kimi K2.7-Code$0.50$2.00$5.0056.8%85.7%256K
GLM-5.2$0.80$2.40$5.6051.4%85.2%256K
Mistral Large 2$2.00$6.00$14.0048.6%85.1%256K
Gemini 3.1 Pro$3.50$10.50$24.5054.2%88.1%1M
Claude Sonnet 4.6$3.00$15.00$33.0058.1%87.6%1M
Claude Opus 4.8$5.00$25.00$55.0069.2%89.8%1M
Claude Opus 4.7$5.00$25.00$55.0063.8%89.2%1M
Grok 4$5.00$25.00$55.0054.8%87.2%256K
GPT-5.5$5.00$30.00$65.0058.6%88.4%400K
GPT-5.5 Pro$8.00$40.00$88.0062.4%89.7%400K
Claude Fable 5$10.00$50.00$110.0080.3%92.1%1M

*Combined = 1M input + 2M output tokens (typical assistant workload). The "SWE-bench" column reflects SWE-bench Pro figures, not Verified, worth knowing before you compare these against scores you've seen elsewhere.

What checks out, and what doesn't

Before you build a budget on this, here's where the figures stand against public pricing as of June 2026:

  • Confirmed: Claude Opus 4.8 at $5/$25 input/output (CloudZero), with a 69.2% SWE-bench Pro score (Morph LLM leaderboard). Claude Opus 4.7 at $5/$25 (Finout). Claude Sonnet 4.6 at $3/$15 with a 1M context. Claude Fable 5 at $10/$50 and 80.3% SWE-bench Pro. GPT-5.5 standard at $5/$30 (AI Pricing Guru). Mistral Large 2 at $2/$6 (AI Pricing Guru). MiniMax M3 at roughly $0.30/$1.20 on its promo rate, 59% SWE-bench Pro, 1M context (VentureBeat).
  • Roughly right, with caveats: Opus 4.7's SWE-bench Pro is reported closer to 64.3% than 63.8% (Vellum). Qwen 3's $0.40/$1.20 matches the Qwen-Plus mid tier, not "Qwen 3" as one SKU (eesel AI). Llama 4 is genuinely free to run on your own hardware (you still pay for the GPUs).
  • Wrong or unconfirmed, don't budget off these: GPT-5.5 Pro is listed at $8/$40, but actual pricing reportedly runs $30/$180 (PricePerToken). Gemini 3.5 Flash is shown at $0.35/$0.70; reported launch pricing is closer to $1.50/$9.00 (DevTk). Gemini 3.1 Pro is listed at $3.50/$10.50, with reported figures nearer $2.00/$12.00 (DevTk). Grok 4 is shown at $5/$25, reportedly closer to $3/$15 (PricePerToken). Kimi K2.7-Code is listed at $0.50/$2.00, reportedly $0.95/$4.00 (TokenCost). GLM-5.2 is shown at $0.80/$2.40 with 256K context; reported figures are $1.40/$4.40 with a 1M window (CloudPrice). GPT-5.5's context is listed as 400K but is reportedly nearer 1M (Skywork). A distinct "DeepSeek V3.5" and a separate "GPT-5.5 Instant" SKU at the prices shown could not be confirmed against current pricing pages.

Price-performance tiers

Read these tiers as the shape of the market, not as fixed quotes. The bands hold up even where individual cells don't.

Free tier: Llama 4. You pay for infrastructure, not tokens. Best if you're self-hosting on GPUs you already own.

Ultra-budget ($1-3): DeepSeek V3.5, Gemini 3.5 Flash, MiniMax M3. Capable models at very low list prices. On the figures shown, DeepSeek wins on input, Flash on output, and MiniMax on raw capability, though, as flagged above, the DeepSeek and Flash numbers here are the unconfirmed ones, so treat that ranking loosely.

Budget ($3-6): Qwen 3, GPT-5.5 Instant, Kimi K2.7-Code. Each has a lane. Qwen for multilingual work, the Instant tier for teams already in the OpenAI ecosystem, Kimi for coding.

Mid-range ($6-15): GLM-5.2, Mistral Large 2. Premium open-weight models with specific strengths, GLM leans on knowledge tasks, Mistral on European languages.

Premium ($15-35): Gemini 3.1 Pro, Sonnet 4.6. Strong closed models, both with 1M-token contexts.

Ultra-premium ($55+): Opus 4.8, Grok 4, GPT-5.5, GPT-5.5 Pro, Fable 5. Top capability, top price. You go here when the work warrants it, not by default.

The pricing trend

Prices are dropping faster than capabilities are climbing. A model scoring 85%+ on MMLU and 50%+ on SWE-bench Pro, frontier territory in 2024, now runs for under a dollar per million tokens. MiniMax M3 is the cleanest example: 86.4% MMLU, 59% SWE-bench Pro, and a promo rate in the $0.30-$1.20 range (VentureBeat). That kind of compression is what's pulling AI into everyday business workflows at scale.

Verdict

The price war is good news if you're buying. The headline gap looks enormous, the cheapest input rate in the table is a fraction of the priciest, but the capability gap is nowhere near that wide. (And the most extreme version of that comparison leans on the DeepSeek figure, which is one of the unconfirmed ones, so don't quote a precise multiple.)

The practical takeaway holds regardless: for most jobs, a model in the sub-$1.20 range will do the work. Keep the ultra-premium models for the tasks where a wrong answer is genuinely expensive, and before you commit a budget, check current vendor pricing yourself, because the figures move and a few in this table are off.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Write the job-to-be-done before looking at another product.
  2. Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
  3. Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Model pricing wars: June 2026 comparison table

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call