The open-weights advantage: Why open models are winning
Two years ago, if you wanted the best AI, you paid for a closed model and you didn't really argue about it. The open-weights alternatives were cheaper, sure, but they trailed badly enough that most teams treated them as a science project rather than a serious option.
That has changed. By June 2026 the gap has narrowed to the point where, outside the very top tier, open models are holding their own against the paid ones, and on price they're not even in the same conversation. An open model like MiniMax M3 now matches GPT-5.5 on a standard coding benchmark while costing a fraction as much (VentureBeat).
For a business team, the practical question has flipped. It used to be "can we get away with an open model?" Now it's "do we actually have a reason to pay for a closed one?" For a lot of workloads, the honest answer is no.
A note before the numbers: most of the benchmark figures below are self-reported by the vendors and aren't independently verified, and a few that float around the comparison sites don't hold up at all. Treat the tables as a rough picture of the landscape, not gospel.
The capability gap has closed
Here's how the top open and closed models stack up across coding tiers. The anchor figures (Opus 4.8 and MiniMax M3) are corroborated; the rest are vendor-claimed or, in a couple of cases, hard to source at all, so read the table as illustrative.
| Tier | Best Closed | Best Open | Gap |
|---|---|---|---|
| Elite coding | Opus 4.8 (69.2%) | , | Closed leads |
| Strong coding | GPT-5.5 Pro (62.4%) | MiniMax M3 (59.0%) | 3.4 pts |
| Mid coding | Sonnet 4.6 (58.1%) | Kimi K2.7-Code (56.8%) | 1.3 pts |
| Entry coding | Gemini 3.5 Flash (48.2%) | Mistral Large 2 (48.6%) | Open leads |
At the very top, closed still wins. No open model touches Opus 4.8, which Anthropic reports at 69.2% on SWE-bench Pro (LLM-Stats). Below the elite tier, though, the picture gets blurry fast. MiniMax M3 lands at 59.0% on SWE-bench Pro per the vendor's own figures, which on those numbers edges past GPT-5.5 rather than trailing it (VentureBeat).
A few caveats worth carrying. The "GPT-5.5 Pro" line at 62.4% doesn't match what the coding leaderboards show; the reported SWE-bench Pro figure for GPT-5.5 is closer to 58.6% (morphllm). The mid-tier and entry-tier rows are shakier still: standardised SWE-bench Pro scores for Sonnet 4.6, Kimi K2.7-Code, Gemini 3.5 Flash, and Mistral Large 2 are mostly not published, and the Mistral Large 2 number in particular looks far too high for a 2024-era model. So the trend is real, but several of these cells are not.
The takeaway holds even after you discount the soft numbers: open models have caught up everywhere except the frontier, and they did it while costing a rounding error.
The structural advantages of openness
These advantages don't depend on any benchmark. They're properties of how open weights work, and they're the part closed vendors can't paper over (ComputingForGeeks).
1. Privacy. You can run an open model on your own hardware, including air-gapped systems with no internet connection. For healthcare, finance, defence, and government, that isn't a nice-to-have. A closed model can't match it at any capability level, because the data has to leave your building to use it.
2. Customisation. Open weights can be fine-tuned on your own data. A fine-tuned Llama 4 will often beat a stronger generalist closed model on your specific domain tasks, even if it loses on the headline benchmark. The model that knows your work beats the model that knows everyone's.
3. Predictable costs. Self-hosting turns AI into a fixed cost (the hardware) instead of a variable one (per-token API billing). At scale, knowing your number in advance is worth a lot to whoever signs off the budget.
4. No vendor lock-in. Open models move. You can shift hosting providers, pull everything on-premise, or push it out to the edge. A closed model ties you to one vendor's infrastructure and one vendor's pricing, and you find out how much that matters the day they change the terms.
5. Community innovation. Thousands of researchers and developers keep improving the open ecosystem around these models: quantisation, inference engines, fine-tuning methods. That work stacks up over time, and you get it for free.
The pricing advantage
This is where the argument stops being close. The price spread is enormous.
| Model | Input Price | SWE-bench Pro | $ per SWE-bench point |
|---|---|---|---|
| Opus 4.8 | $5.00 | 69.2% | $0.072 |
| GPT-5.5 Pro | $8.00 | 62.4% | $0.128 |
| MiniMax M3 | $0.30 | 59.0% | $0.005 |
| DeepSeek V3.5 | $0.15 | 52.4% | $0.003 |
| Llama 4 | Free | 50.2% | $0.000 |
The two figures you can lean on: Opus 4.8 at $5.00 per million input tokens (morphllm), and MiniMax M3 at $0.30 per million input tokens (OpenRouter). On those two alone you're paying roughly one-seventeenth the price for a model that's within shouting distance on the benchmark.
The rest of this table needs flagging. The $8.00 input price for GPT-5.5 Pro doesn't appear in the coding leaderboards, which list GPT-5.5 closer to $5.00 input. "DeepSeek V3.5" doesn't appear to be a real release at all; DeepSeek's actual 2026 line-up is V3.2 and the V4-Pro / V4-Flash models, with different scores and prices (DeepSeek API Docs). And the claimed 50.2% SWE-bench Pro score for Llama 4 runs well above its documented results, which sit far lower. Llama 4 being free to self-host is accurate; the score next to it is not.
So the headline that "DeepSeek V3.5 delivers 75% of Opus 4.8's coding performance at 3% of the price" rests on a model that doesn't seem to exist, and you should treat it as unconfirmed. The real version of the point still lands, though: with MiniMax M3 you're getting most of the capability for a tiny share of the cost, and for most jobs that trade is hard to argue with.
When closed models still win
Open isn't always the answer. Three situations where a closed model is the right call:
- Maximum capability. When a mistake is genuinely expensive, medical diagnosis, legal advice, you want the best model available, and right now that's still closed.
- Ecosystem integration. When you need vendor-specific plumbing, like OpenAI's Assistants API or Anthropic's tool use, the closed product is doing work an open model won't.
- Convenience. If you don't have the infrastructure or the people to self-host, paying for an API is the cheaper option once you count the engineering time you'd otherwise spend.
Verdict
Open-weights models have gone from "interesting alternative" to "reasonable default." MiniMax M3 and Llama 4 offer combinations of capability, price, and flexibility that closed models can't touch outside the very top tier, and the gap at the frontier keeps shrinking (VentureBeat).
For most teams, the sensible move now is to start with an open model and only reach for a closed one when you have a specific reason. That's close to the opposite of where the advice sat two years ago.


