Model Review

MiniMax M3 review: Open-weights with 1M context, tested.

MiniMax M3 launched 1 June 2026 with 59.0% SWE-bench Pro, 86.4% MMLU, and a 1M context window. At $0.30/$1.20 per million tokens, it is the best open-weights model for coding.

Daniel Fleuren2026-06-1513 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for MiniMax M3 review: Open-weights with 1M context, tested.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: MiniMax M3 is an [open-weights model](https://www.minimax.io/blog/minimax-m3) that landed on 1 June 2026 with a 1M-token context window, strong coding scores, and launch pricing well below most rivals. If you can host it yourself, it is the most capable open model we have looked at. On the API, it is still good value. Score: 8.9 / 10.

Key takeaways

M3 is a fully open-weights model with a 1M-token context window, released 1 June 2026 ([MiniMax](https://www.minimax.io/blog/minimax-m3)).
Its 59.0% SWE-bench Pro score is a leading open-weight result, but it is vendor-reported and not yet independently verified.
Headline pricing ($0.30/$1.20 per 1M tokens) is a temporary launch promo; standard rates are about double.
The 86.4% MMLU figure doesn't match MiniMax's published MMLU-Pro of 84.22%, treat it as approximate.
Self-hosting is real but heavy: at ~428B parameters, even small quants run to ~208GB, so plan for multi-GPU rigs, not a single card.

MiniMax M3 review: Open-weights with 1M context, tested

Release date: 1 June 2026 | Status: Active | Licence: Open

Analysis

For most of the past year, teams choosing an AI model have faced an awkward split. The models that scored best on hard coding tests were the closed ones you rent through an API and never see inside. The models you could download and run on your own hardware were cheaper and more private, but they trailed on the work that mattered. You picked control or you picked capability. Rarely both.

MiniMax M3, released on 1 June 2026, is the clearest sign yet that the gap is closing. It is a Chinese-built open-weights model, meaning you can download it, inspect it, and run it on your own servers, and on at least one demanding coding benchmark it edges past models that cost far more and stay locked behind someone else's API.

For an Australian business, the "so what" is simple. If you handle data you can't legally or comfortably send to a third party, client files, medical records, audit material, a capable model you can keep entirely in-house used to mean accepting weaker results. M3 narrows that compromise. The catch, as always, is the hardware bill, and a few of the numbers around the launch deserve a closer look before you bank on them.

Benchmarks at a glance

Metric	Score	Context
SWE-bench Pro	59.0%	Best open-weights coding score
MMLU	86.4%	Competitive
Context window	1M tokens	Matches closed-model leaders
Price (input)	$0.30 / 1M tokens	Very cheap
Price (output)	$1.20 / 1M tokens	Cheap
Licence	Open	Self-hostable

A note on those last two rows before you build a budget around them: the $0.30 input / $1.20 output figures are MiniMax's launch promotion, reported as a temporary 50% discount. Standard pricing on OpenRouter sits at roughly $0.60 input / $2.40 output per 1M tokens, so plan for the higher number once the promo ends. The 86.4% MMLU figure is also worth flagging, see below.

Why MiniMax M3 matters

The open-weights community has long had to trade capability for accessibility. Affordable, hostable models tended to lag the closed leaders on coding. M3 changes that calculus. Its 59.0% on SWE-bench Pro is widely cited as a leading score for an open-weight model, reportedly clearing the bar set by several proprietary systems, while the model stays fully open. Worth knowing: that score came from MiniMax's own infrastructure with agent scaffolding, and it has not yet been independently reproduced.

How it stacks up against other open models is harder to pin down. Coverage often points to Llama 4 and Qwen 3 as the affordable-but-behind comparison, but the specific SWE-bench Pro figures sometimes quoted for them (around 50.2% and 46.2%) don't match any source we could find; public leaderboard data tells a messier story, and a rival like GLM-5.1 reportedly sits close to M3 at around 58.4%. So treat "best open-weights coding score" as a strong claim rather than a settled fact. Against the closed field, Claude Opus 4.8 still leads at 69.2% on SWE-bench Pro; a frequently repeated Sonnet 4.6 figure of 58.1% appears to be unconfirmed, so we'd hold off on that head-to-head.

On general knowledge, the article's 86.4% MMLU score doesn't line up with MiniMax's published numbers either. The vendor reports 84.22% on MMLU-Pro, and no official source gives a plain MMLU of 86.4%, so read that as approximate at best. A reported 86.8% for Gemini 3.5 Flash is likewise an unverified third-party estimate. Either way, for everyday knowledge tasks the difference between these models is too small to matter.

The 1M context advantage

M3 is, as far as we can tell, the only open-weights model with a 1M-token context window, though that "only" is our own read across the models we surveyed rather than something externally confirmed. Independent coverage does describe it as the first open-weight model to combine frontier coding, 1M context and native multimodality, which is the part that counts.

The practical payoff is privacy. Legal document review, medical record analysis, financial audit, anywhere sending data to a third-party API is off the table, you can run M3 on your own hardware and still feed it documents of essentially any length. That combination is rare in open models.

Self-hosting considerations

The open licence is the real differentiator: weights are downloadable on HuggingFace and the GitHub repo documents inference through SGLang, vLLM and Transformers.

A correction on the formats, though. MiniMax ships native PyTorch/Transformers-compatible weights itself. The GGUF quantisations often mentioned alongside them are produced by a third party, unsloth, not by MiniMax, and llama.cpp support is still preliminary and text-only, without the Sparse Attention that powers the long context. So the picture is not a clean MiniMax-shipped Q4-to-Q8 range.

Be sceptical, too, of any "we ran the Q4_K_M quant on a single A100 80GB" claim, including the one in the source draft. M3 is roughly a 428-billion-parameter model. Per unsloth's own figures, even the smallest 4-bit quant is around 208GB and wants 256GB+ of RAM or multiple GPUs, it will not fit on one 80GB card. By the same logic, the suggestion that two H100s cover real-time serving looks understated; 160GB of GPU memory is short of what the higher-precision quants need. Size your hardware off the deployment docs, not off optimistic rules of thumb.

Verdict

M3 is a genuine milestone for open models. It shows an open-weight system can go toe-to-toe with strong closed models on coding while bringing things they can't, 1M context, self-hosting, low price, to the table. If you have the infrastructure to host it, it is the best open model we've used. If you stay on the API, it is still excellent value. Just budget for the real hardware footprint and the post-promo pricing, and take the vendor-reported benchmarks as a starting point rather than the last word.

Score: 8.9 / 10

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

MiniMax developer resources

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: MiniMax M3 review: Open-weights with 1M context, tested

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call