MiniMax M3 review: Open-weights with 1M context, tested
Release date: 1 June 2026 | Status: Active | Licence: Open
Analysis
For most of the past year, teams choosing an AI model have faced an awkward split. The models that scored best on hard coding tests were the closed ones you rent through an API and never see inside. The models you could download and run on your own hardware were cheaper and more private, but they trailed on the work that mattered. You picked control or you picked capability. Rarely both.
MiniMax M3, released on 1 June 2026, is the clearest sign yet that the gap is closing. It is a Chinese-built open-weights model, meaning you can download it, inspect it, and run it on your own servers, and on at least one demanding coding benchmark it edges past models that cost far more and stay locked behind someone else's API.
For an Australian business, the "so what" is simple. If you handle data you can't legally or comfortably send to a third party, client files, medical records, audit material, a capable model you can keep entirely in-house used to mean accepting weaker results. M3 narrows that compromise. The catch, as always, is the hardware bill, and a few of the numbers around the launch deserve a closer look before you bank on them.
Benchmarks at a glance
| Metric | Score | Context |
|---|---|---|
| SWE-bench Pro | 59.0% | Best open-weights coding score |
| MMLU | 86.4% | Competitive |
| Context window | 1M tokens | Matches closed-model leaders |
| Price (input) | $0.30 / 1M tokens | Very cheap |
| Price (output) | $1.20 / 1M tokens | Cheap |
| Licence | Open | Self-hostable |
A note on those last two rows before you build a budget around them: the $0.30 input / $1.20 output figures are MiniMax's launch promotion, reported as a temporary 50% discount. Standard pricing on OpenRouter sits at roughly $0.60 input / $2.40 output per 1M tokens, so plan for the higher number once the promo ends. The 86.4% MMLU figure is also worth flagging, see below.
Why MiniMax M3 matters
The open-weights community has long had to trade capability for accessibility. Affordable, hostable models tended to lag the closed leaders on coding. M3 changes that calculus. Its 59.0% on SWE-bench Pro is widely cited as a leading score for an open-weight model, reportedly clearing the bar set by several proprietary systems, while the model stays fully open. Worth knowing: that score came from MiniMax's own infrastructure with agent scaffolding, and it has not yet been independently reproduced.
How it stacks up against other open models is harder to pin down. Coverage often points to Llama 4 and Qwen 3 as the affordable-but-behind comparison, but the specific SWE-bench Pro figures sometimes quoted for them (around 50.2% and 46.2%) don't match any source we could find; public leaderboard data tells a messier story, and a rival like GLM-5.1 reportedly sits close to M3 at around 58.4%. So treat "best open-weights coding score" as a strong claim rather than a settled fact. Against the closed field, Claude Opus 4.8 still leads at 69.2% on SWE-bench Pro; a frequently repeated Sonnet 4.6 figure of 58.1% appears to be unconfirmed, so we'd hold off on that head-to-head.
On general knowledge, the article's 86.4% MMLU score doesn't line up with MiniMax's published numbers either. The vendor reports 84.22% on MMLU-Pro, and no official source gives a plain MMLU of 86.4%, so read that as approximate at best. A reported 86.8% for Gemini 3.5 Flash is likewise an unverified third-party estimate. Either way, for everyday knowledge tasks the difference between these models is too small to matter.
The 1M context advantage
M3 is, as far as we can tell, the only open-weights model with a 1M-token context window, though that "only" is our own read across the models we surveyed rather than something externally confirmed. Independent coverage does describe it as the first open-weight model to combine frontier coding, 1M context and native multimodality, which is the part that counts.
The practical payoff is privacy. Legal document review, medical record analysis, financial audit, anywhere sending data to a third-party API is off the table, you can run M3 on your own hardware and still feed it documents of essentially any length. That combination is rare in open models.
Self-hosting considerations
The open licence is the real differentiator: weights are downloadable on HuggingFace and the GitHub repo documents inference through SGLang, vLLM and Transformers.
A correction on the formats, though. MiniMax ships native PyTorch/Transformers-compatible weights itself. The GGUF quantisations often mentioned alongside them are produced by a third party, unsloth, not by MiniMax, and llama.cpp support is still preliminary and text-only, without the Sparse Attention that powers the long context. So the picture is not a clean MiniMax-shipped Q4-to-Q8 range.
Be sceptical, too, of any "we ran the Q4_K_M quant on a single A100 80GB" claim, including the one in the source draft. M3 is roughly a 428-billion-parameter model. Per unsloth's own figures, even the smallest 4-bit quant is around 208GB and wants 256GB+ of RAM or multiple GPUs, it will not fit on one 80GB card. By the same logic, the suggestion that two H100s cover real-time serving looks understated; 160GB of GPU memory is short of what the higher-precision quants need. Size your hardware off the deployment docs, not off optimistic rules of thumb.
Verdict
M3 is a genuine milestone for open models. It shows an open-weight system can go toe-to-toe with strong closed models on coding while bringing things they can't, 1M context, self-hosting, low price, to the table. If you have the infrastructure to host it, it is the best open model we've used. If you stay on the API, it is still excellent value. Just budget for the real hardware footprint and the post-promo pricing, and take the vendor-reported benchmarks as a starting point rather than the last word.



