Back to news

AI News

Context Windows Hit 1 Million Tokens: What It Means for Developers.

One-million-token context windows are now available from multiple providers. We explore the applications this enables and the practical challenges of using such enormous contexts.

AI Kick Start editorial image for Context Windows Hit 1 Million Tokens: What It Means for Developers.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: One-million-token context windows are now available from MiniMax M3, DeepSeek's latest models, and Google's Gemini family. This capability enables new application categories, full codebase analysis, multi-document legal review, and long-form content processing, but comes with practical challenges around cost, latency, and effective context utilisation.

Key takeaways

  • Several major models now offer 1M-token context windows, priced from roughly $0.15 to $3.50 per million input tokens, though several of these figures are promotional or unconfirmed (Source: Provider pricing, 2026; see [OpenRouter, MiniMax M3](https://openrouter.ai/minimax/minimax-m3))
  • Full codebase analysis, multi-document legal review, and long-form content creation are newly enabled applications (Source: Application analysis, 2026)
  • Needle-in-haystack accuracy is reported to vary from 93-97% across providers at full 1M-token scale, but those specific figures are unconfirmed (Source: Independent testing, 2026)
  • Cost, latency, and effective context management remain significant practical challenges (Source: Developer reports, 2026)

Analysis

For years, the standard way to make an AI read a long document was to chop it into pieces, store the pieces, and feed the model only the bits that looked relevant to your question. It worked, but it was fiddly, and it broke in annoying ways. As of mid-2026, a handful of models will just take the whole thing.

A million tokens of context is roughly 750,000 words. That is the entire works of Shakespeare, or a medium-sized software project, dropped into a single prompt and read in one go. Twelve months ago, 128,000 tokens counted as a long context window. The new ceiling is about eight times bigger.

For an Australian business team, the "so what" is straightforward. A lot of work that used to need a custom retrieval system, a search layer, a vector database, a pile of glue code, can now be done by handing the model the source material directly and asking a plain question. That is cheaper to build and easier to reason about.

The catch is that bigger isn't automatically better. These long-context requests cost more per call, run slower, and reward teams who structure their inputs carefully. The rest of this piece walks through what the million-token window actually unlocks, and where it bites.

The million-token context window has arrived. In June 2026, developers can choose from several models built around 1 million tokens of context. MiniMax M3 is open-weight and launched at roughly $0.30/$1.20 per million input/output tokens, though that is a 50%-off launch promotion; the standard rate is closer to $0.60/$2.40 (OpenRouter, MiniMax M3 pricing & benchmarks). DeepSeek's newest open-weight release also ships a native 1M context, note that DeepSeek's line went from V3.2 to a V4 Preview in April 2026, so there is no "V3.5", and the often-quoted $0.15/$0.60 figure for it is unconfirmed (DeepSeek API Docs, V4 Preview release). Google's Gemini 3.5 Flash carries a 1M-token input window too, reportedly priced nearer $1.50/$9.00 rather than the lower $0.35/$0.70 sometimes cited (OpenRouter, Gemini 3.5 Flash), and Gemini 3.1 Pro is, by available accounts, a 2M-token model priced around $2/$12 rather than the $3.50/$10.50 figure that circulates. A year ago, 128K tokens was considered long context. Today that is 8x shorter than the new standard (The Decoder, million-token context for open models).

This is more than a spec bump. It changes what these systems can do. A million tokens is about 750,000 words (token-to-word ratio, industry standard ~0.75 words/token), enough to hold the entire King James Bible, the complete works of Shakespeare, or a medium-sized software codebase in a single prompt. Work that used to demand a complex retrieval architecture can now run on plain prompt engineering.

What 1M Tokens Enables

The new applications fall into three broad areas.

Full codebase understanding: a 1M-token context can hold somewhere around 500,000 to 700,000 lines of code, depending on the language and how heavily it's commented, an order-of-magnitude estimate rather than a measured figure. That covers most individual microservices, libraries, or apps. You can ask "how does authentication work in this codebase?" or "find every place we sanitise user input" and have the model read the whole repository in one pass. Tools like Kimi K2.7 Code have shown real strength at spotting cross-file dependencies and refactoring opportunities, though it's worth noting K2.7 Code runs a 256K-token window rather than a full 1M, so the very largest repos still need to be fed in sections (Codersera, Kimi K2.7 Code guide).

Multi-document legal and financial analysis: case files, financial filings, and regulatory submissions often run to hundreds or thousands of pages. With a 1M-token context, a lawyer can load an entire case file, complaints, motions, depositions, exhibits, and ask the model to flag inconsistencies, summarise the key arguments, or draft a responsive pleading. A financial analyst can pull in years of filings, earnings-call transcripts, and analyst notes to build out an investment thesis.

Long-form content creation and analysis: authors, researchers, and content teams can work at document length instead of paragraph length. A novelist can ask the model to check a 200,000-word manuscript for plot holes. A researcher can pull findings together across dozens of papers. A journalist can run thousands of pages of leaked documents to surface patterns and connections.

Supporting AI Kick Start editorial image for context-windows-1m-tokens-developers.
Generated AI Kick Start editorial visual used to explain the article's practical workflow and trade-offs.

The Practical Challenges

The enthusiasm is warranted, but long-context work comes with real constraints you have to plan around.

Cost: even at budget pricing, a full 1M-token prompt runs somewhere around $0.15-0.35 in input alone, and the lower end of that range leans on the unconfirmed DeepSeek figure noted earlier. Add a long response, say 100K tokens, and a single request can hit $0.75-1.50. Across many documents that adds up fast. A legal discovery job running 10,000 documents at full context could, on these numbers, cost in the region of $15,000 per run, an illustrative projection, not a quoted price.

Latency: long-context inference is slower than short-context, full stop. Generic estimates put a 1M-token request at 30-90 seconds, though that's a loose ceiling: MiniMax M3 in particular is considerably faster thanks to its sparse-attention design, named MiniMax Sparse Attention rather than the "dynamic sparse attention" tag that sometimes gets attached to it (GitHub, MiniMax-AI/MiniMax-M3). Either way, this suits batch workflows far better than anything real-time.

Effective utilisation: models don't all use long context equally well. Needle-in-a-haystack tests, can the model find one specific fact buried in a long document?, show wide variation. Figures circulating put MiniMax M3 near 97% accuracy at 1M tokens and some DeepSeek models around 93%, but those specific numbers are unconfirmed and should be treated as rumoured rather than measured. What is well established is the broader pattern: some models that advertise a 1M-token window degrade noticeably past about 600K tokens in practice.

Context management: having room for 1M tokens doesn't mean you should fill it. Good long-context prompting takes structure, well-organised documents, clear sections, and explicit instructions about what to focus on. Skip that and the model can drown in the volume and hand back worse answers than it would from a shorter, tighter prompt.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Write the job-to-be-done before looking at another product.
  2. Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
  3. Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Context Windows Hit 1 Million Tokens: What It Means for Developers

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call