AI News

MiniMax M3: The First Open-Weights Model With 1 Million Token Context.

MiniMax M3 launched on 1 June 2026 as the first open-weights model to offer a 1-million-token context window. At $0.30/$1.20, it represents a new frontier for accessible long-context AI.

Daniel Fleuren2026-06-0511 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for MiniMax M3: The First Open-Weights Model With 1 Million Token Context.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: MiniMax M3, [released 1 June 2026](https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/), is an open-weights model that pairs a 1-million-token context window with frontier coding performance. Priced at [$0.30/$1.20 per million tokens](https://artificialanalysis.ai/models/minimax-m3) and scoring 59.0% on SWE-Bench Pro, it pushes back on the idea that a reliably usable million-token window has to come from a closed, proprietary vendor.

Key takeaways

MiniMax M3 is an open-weights model offering a reliably usable 1-million-token context window with frontier coding performance ([The Decoder, 2026](https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/))
It uses MiniMax Sparse Attention (MSA) on top of grouped query attention to keep long-context costs down; MiniMax reports roughly one-twentieth the per-token compute of its predecessor ([Fello AI, 2026](https://felloai.com/minimax-m3/))
An SWE-Bench Pro score of 59.0% places M3 narrowly ahead of GPT-5.5 (58.6%) and behind Claude Opus 4.8 (69.2%) ([Morph SWE-Bench Pro, 2026](https://www.morphllm.com/swe-bench-pro))
MiniMax reports 100% lossless needle-in-a-haystack recall at full context ([MiniMax, 2026](https://www.minimax.io/models/text/m3))

Analysis

For most of the past two years, "long context" was a feature you rented. If you wanted to feed an AI model a whole contract, a full codebase, or a year of support tickets in one go, you went to OpenAI, Google, or Anthropic, paid their rates, and accepted that the weights stayed on their servers, not yours.

MiniMax M3 chips away at that arrangement. Launched on 1 June 2026, it is an open-weights model with a 1-million-token context window and coding scores in the same conversation as the big proprietary names. You can download the weights, read how it was built, and run it on hardware you control.

For an Australian business, the practical question is simple: do you have to keep handing your longest, most sensitive documents to someone else's API, or can you now run a capable long-context model in-house? M3 makes that a real choice rather than a thought experiment. The catch, as usual, is in the hardware bill and the licence fine print.

Architecture and Technical Innovation

M3 is built on a Mixture-of-Experts (MoE) design. Rather than firing every parameter for every token, it routes each token to a subset of "expert" modules, which keeps the compute bill in check. That matters enormously at a million tokens, where a traditional dense model would chew through memory and money. Note that the specific split this article originally cited (32 billion active out of 256 billion total) does not match the sources: the official repo and Artificial Analysis put M3 at roughly 428 billion total parameters with about 23 billion active.

The piece that makes the long window affordable is the model's sparse attention. Standard attention costs scale with the square of the context length, which is what makes a million-token window so expensive on paper. M3 narrows what each token actually attends to. MiniMax calls this approach MiniMax Sparse Attention (MSA), built on top of grouped query attention. (An earlier version of this article called it "dynamic sparse attention" and claimed a tidy "3x the 128K cost versus 64x" improvement; neither the name nor that exact figure is supported by MiniMax's own materials, which instead describe roughly one-twentieth the per-token compute of the predecessor and reported speedups of around 9x on prefill and 15x on decode at 1M context.)

Supporting AI Kick Start editorial image for minimax-m3-first-open-weights-1m-context. — Generated AI Kick Start editorial visual used to explain the article's practical workflow and trade-offs.

Benchmark Performance

On coding, M3 scores 59.0% on SWE-Bench Pro, landing just ahead of GPT-5.5 at 58.6% and behind Claude Opus 4.8 at 69.2%. Worth flagging: the original draft called this "SWE-bench Verified," but the 59.0% figure and its comparators are SWE-Bench Pro numbers, not Verified. The ranking holds; the label was wrong.

A few other scores appeared in the original write-up, MMLU-Pro 79.4%, HumanEval 84.2%, MATH 68.7%, but these are unconfirmed. MiniMax published agentic and coding benchmarks (SWE-Bench Pro, Terminal-Bench, BrowseComp and others), not those three, so treat the figures as unsourced rather than established.

What you can say plainly is this: M3 is competitive on coding while also offering a 1M-token window and downloadable weights, which is a combination few rivals match.

The "needle in a haystack" test, can the model find one specific fact buried in a very long document, is often where long-context claims fall apart. MiniMax reports 100% lossless recall on that test at full context. (The original article cited a 97% figure attributed to "independent testing" and compared it to Gemini 1.5 Pro; we couldn't verify that testing or the comparison, and Gemini 1.5 Pro is a 2024-era model, so we've dropped that framing.) Either way, reliable retrieval is what separates a genuinely useful long window from a number on a spec sheet.

Pricing and Accessibility

Through MiniMax's API, M3 runs at $0.30 per million input tokens and $1.20 per million output tokens. The original draft claimed this undercut Gemini 3.5 Flash at "$0.35/$0.70" on inputs; that comparison is wrong, since Gemini 3.5 Flash is priced at $1.50 input and $9.00 output per million tokens, so M3 is well cheaper rather than narrowly so. For the open-weights version there's no per-token fee at all, you pay for hardware and run it yourself.

Running M3 at the full million-token context is not cheap to self-host. Deployment guides point to around 8x H100 GPUs (640GB) for full-context single-request inference at FP8. The roughly $200,000 capital figure often quoted for that kit is plausible but isn't directly sourced, so treat it as a ballpark. If you'd rather not buy GPUs, M3 is also hosted through providers including OpenRouter and NVIDIA's NIM catalogue at competitive per-token rates. (The original article named Lambda Labs, Together AI, and Fireworks as hosts, but we couldn't confirm those specifically.)

Impact on the Open-Weights Ecosystem

This is where the original article's framing needs a correction. It claimed the longest-context open model before M3 was Llama 4 at 128K tokens, extended 8x by M3. That's not right: 128K was Llama 3, and Llama 4 Scout advertises a 10-million-token window. So M3 is not the only open model to reach the million-token mark. The more defensible claim is that M3 is an open-weights model offering a reliably usable 1M window alongside frontier coding performance, a sharper bar than raw advertised context length.

Either way, a usable million-token window in open weights opens doors that were awkward before: multi-document legal review, reading a whole repository at once, working through long-form transcripts or video.

The release has reignited the open-versus-closed argument. Supporters say it shows open models can keep pace on the capabilities people actually care about. Skeptics point out that MoE models are fiddlier to fine-tune and deploy than plain dense models, which can blunt the "open" advantage in practice. There's also a licence caveat: M3 ships under the MiniMax Community License, and Artificial Analysis notes that commercial use requires a separate agreement, so "minimal restrictions" oversells it. Read the licence before you build on it.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

MiniMax developer resources

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: MiniMax M3: The First Open-Weights Model With 1 Million Token Context

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call