Back to news

Model Review

The open-weights advantage: Why open models are winning.

MiniMax M3 (59.0% SWE-bench), DeepSeek V3.5 (52.4%), and Llama 4 (free) prove open-weights models can compete with closed alternatives. We analyse why openness is becoming the default choice.

AI Kick Start editorial image for The open-weights advantage: Why open models are winning.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: MiniMax M3 (59.0% SWE-bench), DeepSeek V3.5 (52.4%), and Llama 4 (free) prove open-weights models can compete with closed alternatives. We analyse why openness is becoming the default choice.

Key takeaways

  • The open-weights advantage: Why open models are winning: Two years ago, if you wanted the best AI, you paid for a closed model and you didn't really argue about it.
  • The capability gap has closed: Here's how the top open and closed models stack up across coding tiers.
  • The structural advantages of openness: These advantages don't depend on any benchmark.
  • The pricing advantage: This is where the argument stops being close.
  • When closed models still win: Open isn't always the answer.

The open-weights advantage: Why open models are winning

Two years ago, if you wanted the best AI, you paid for a closed model and you didn't really argue about it. The open-weights alternatives were cheaper, sure, but they trailed badly enough that most teams treated them as a science project rather than a serious option.

That has changed. By June 2026 the gap has narrowed to the point where, outside the very top tier, open models are holding their own against the paid ones, and on price they're not even in the same conversation. An open model like MiniMax M3 now matches GPT-5.5 on a standard coding benchmark while costing a fraction as much (VentureBeat).

For a business team, the practical question has flipped. It used to be "can we get away with an open model?" Now it's "do we actually have a reason to pay for a closed one?" For a lot of workloads, the honest answer is no.

A note before the numbers: most of the benchmark figures below are self-reported by the vendors and aren't independently verified, and a few that float around the comparison sites don't hold up at all. Treat the tables as a rough picture of the landscape, not gospel.

The capability gap has closed

Here's how the top open and closed models stack up across coding tiers. The anchor figures (Opus 4.8 and MiniMax M3) are corroborated; the rest are vendor-claimed or, in a couple of cases, hard to source at all, so read the table as illustrative.

TierBest ClosedBest OpenGap
Elite codingOpus 4.8 (69.2%),Closed leads
Strong codingGPT-5.5 Pro (62.4%)MiniMax M3 (59.0%)3.4 pts
Mid codingSonnet 4.6 (58.1%)Kimi K2.7-Code (56.8%)1.3 pts
Entry codingGemini 3.5 Flash (48.2%)Mistral Large 2 (48.6%)Open leads

At the very top, closed still wins. No open model touches Opus 4.8, which Anthropic reports at 69.2% on SWE-bench Pro (LLM-Stats). Below the elite tier, though, the picture gets blurry fast. MiniMax M3 lands at 59.0% on SWE-bench Pro per the vendor's own figures, which on those numbers edges past GPT-5.5 rather than trailing it (VentureBeat).

A few caveats worth carrying. The "GPT-5.5 Pro" line at 62.4% doesn't match what the coding leaderboards show; the reported SWE-bench Pro figure for GPT-5.5 is closer to 58.6% (morphllm). The mid-tier and entry-tier rows are shakier still: standardised SWE-bench Pro scores for Sonnet 4.6, Kimi K2.7-Code, Gemini 3.5 Flash, and Mistral Large 2 are mostly not published, and the Mistral Large 2 number in particular looks far too high for a 2024-era model. So the trend is real, but several of these cells are not.

The takeaway holds even after you discount the soft numbers: open models have caught up everywhere except the frontier, and they did it while costing a rounding error.

The structural advantages of openness

These advantages don't depend on any benchmark. They're properties of how open weights work, and they're the part closed vendors can't paper over (ComputingForGeeks).

1. Privacy. You can run an open model on your own hardware, including air-gapped systems with no internet connection. For healthcare, finance, defence, and government, that isn't a nice-to-have. A closed model can't match it at any capability level, because the data has to leave your building to use it.

2. Customisation. Open weights can be fine-tuned on your own data. A fine-tuned Llama 4 will often beat a stronger generalist closed model on your specific domain tasks, even if it loses on the headline benchmark. The model that knows your work beats the model that knows everyone's.

3. Predictable costs. Self-hosting turns AI into a fixed cost (the hardware) instead of a variable one (per-token API billing). At scale, knowing your number in advance is worth a lot to whoever signs off the budget.

4. No vendor lock-in. Open models move. You can shift hosting providers, pull everything on-premise, or push it out to the edge. A closed model ties you to one vendor's infrastructure and one vendor's pricing, and you find out how much that matters the day they change the terms.

5. Community innovation. Thousands of researchers and developers keep improving the open ecosystem around these models: quantisation, inference engines, fine-tuning methods. That work stacks up over time, and you get it for free.

The pricing advantage

This is where the argument stops being close. The price spread is enormous.

ModelInput PriceSWE-bench Pro$ per SWE-bench point
Opus 4.8$5.0069.2%$0.072
GPT-5.5 Pro$8.0062.4%$0.128
MiniMax M3$0.3059.0%$0.005
DeepSeek V3.5$0.1552.4%$0.003
Llama 4Free50.2%$0.000

The two figures you can lean on: Opus 4.8 at $5.00 per million input tokens (morphllm), and MiniMax M3 at $0.30 per million input tokens (OpenRouter). On those two alone you're paying roughly one-seventeenth the price for a model that's within shouting distance on the benchmark.

The rest of this table needs flagging. The $8.00 input price for GPT-5.5 Pro doesn't appear in the coding leaderboards, which list GPT-5.5 closer to $5.00 input. "DeepSeek V3.5" doesn't appear to be a real release at all; DeepSeek's actual 2026 line-up is V3.2 and the V4-Pro / V4-Flash models, with different scores and prices (DeepSeek API Docs). And the claimed 50.2% SWE-bench Pro score for Llama 4 runs well above its documented results, which sit far lower. Llama 4 being free to self-host is accurate; the score next to it is not.

So the headline that "DeepSeek V3.5 delivers 75% of Opus 4.8's coding performance at 3% of the price" rests on a model that doesn't seem to exist, and you should treat it as unconfirmed. The real version of the point still lands, though: with MiniMax M3 you're getting most of the capability for a tiny share of the cost, and for most jobs that trade is hard to argue with.

When closed models still win

Open isn't always the answer. Three situations where a closed model is the right call:

  1. Maximum capability. When a mistake is genuinely expensive, medical diagnosis, legal advice, you want the best model available, and right now that's still closed.
  2. Ecosystem integration. When you need vendor-specific plumbing, like OpenAI's Assistants API or Anthropic's tool use, the closed product is doing work an open model won't.
  3. Convenience. If you don't have the infrastructure or the people to self-host, paying for an API is the cheaper option once you count the engineering time you'd otherwise spend.

Verdict

Open-weights models have gone from "interesting alternative" to "reasonable default." MiniMax M3 and Llama 4 offer combinations of capability, price, and flexibility that closed models can't touch outside the very top tier, and the gap at the frontier keeps shrinking (VentureBeat).

For most teams, the sensible move now is to start with an open model and only reach for a closed one when you have a specific reason. That's close to the opposite of where the advice sat two years ago.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Write the job-to-be-done before looking at another product.
  2. Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
  3. Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: The open-weights advantage: Why open models are winning

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call