AI News

DeepSeek V3.5: The $0.15 per Million Token Model That Disrupts Pricing.

DeepSeek V3.5 launched on 20 March 2026 with a 1-million-token context and prices that undercut every major competitor. We analyse whether extreme affordability comes with hidden costs.

Daniel Fleuren2026-04-0510 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for DeepSeek V3.5: The $0.15 per Million Token Model That Disrupts Pricing.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: DeepSeek V3.5, reportedly released 20 March 2026, is said to offer a 1-million-token context window at $0.15/$0.60 per million tokens, which would make it the lowest price of any model with comparable capability. These details are unconfirmed and we could not find a "V3.5" in DeepSeek's [official release history](https://api-docs.deepseek.com/updates), which jumps from V3.2 straight to the V4 line. Treat the specifics below as claims to verify, not settled fact. The broader point still holds: DeepSeek's whole pitch is being cheap enough to change what you can afford to run.

Key takeaways

DeepSeek V3.5 reportedly costs $0.15/$0.60 per million tokens, which would make it the cheapest major model with a 1M context window, unconfirmed; no such model appears in DeepSeek's [official changelog](https://api-docs.deepseek.com/updates) (Source: DeepSeek, 2026)
The quoted benchmark scores are mid-tier, 76.8% MMLU-Pro, 81.4% HumanEval, 48.7% SWE-bench, and could not be verified against any documented model (Source: DeepSeek, 2026)
A reported 93% needle-in-haystack accuracy at 1M tokens would be adequate but behind best-in-class; the figure is attributed to unnamed testing we couldn't confirm (Source: Independent testing, 2026)
A reported 95 tokens-per-second generation speed is competitive but slower than some alternatives; also unsourced (Source: Independent testing, 2026)

Analysis

A note before you read on. We went looking for the model this article is about and could not confirm it exists. DeepSeek's own changelog lists V3.2 in December 2025, then the V4 family in April 2026, no "V3.5", and no 20 March 2026 release. Several of the prices and benchmark figures quoted here also don't line up with any DeepSeek model we can find documentation for.

So read this as a report on a set of circulating claims about a budget Chinese model, not as a spec sheet you can buy against. We've kept every number the original draft carried, but flagged the unconfirmed ones as exactly that. Where a fact does check out, who funds DeepSeek, what the big rivals actually charge, the recent US export action, we've linked the source.

Why bother running it at all? Because the underlying story is real and worth understanding. DeepSeek has spent two years undercutting everyone else on price, and a sub-dollar model with a million-token window would genuinely shift the maths for high-volume work. If a model like the one described below ships and the pricing is anywhere near accurate, plenty of Australian teams will want to know what they'd be trading away to get it.

DeepSeek has built its name on one thing: being cheap. The Chinese lab is funded by the quantitative trading firm High-Flyer), and its models have repeatedly come in 5 to 10 times under competitors while staying usable. The reportedly-released V3.5 is described as that strategy pushed to the limit.

At a claimed $0.15 per million input tokens and $0.60 per million output tokens, V3.5 would be the cheapest model from any major lab. On those figures it's pitched as 23x cheaper than GPT-5.5 ($5/$30) and 33x cheaper than Claude Opus 4.8 ($5/$25), and those two competitor prices do check out. The draft also claims it's 2x cheaper than Gemini 3.5 Flash at $0.35/$0.70; that Gemini figure looks wrong, since Gemini 3.5 Flash is documented at $1.50/$9.00 per million tokens, not $0.35/$0.70. And unlike most budget models, V3.5 is said to ship a 1-million-token context window, the same range as MiniMax M3, Gemini 3.5 Flash, and Gemini 3.1 Pro, which do all run 1M context.

What You Get for the Price

The benchmark scores quoted for V3.5 sit where you'd expect a budget model to land, and none of them could be verified against a real model. MMLU-Pro: 76.8%. HumanEval: 81.4%. MATH: 63.2%. Not headline numbers, but mid-tier, in the range of models said to cost 5 to 10 times more. SWE-bench: 48.7%, which on paper means routine coding is fine but anything genuinely hard in software engineering will trip it up. Worth noting: the real DeepSeek line reportedly scores higher than this, so these figures may describe nothing that shipped.

The pitch is that V3.5 earns its keep where volume matters more than peak smarts. Content moderation, document classification, data extraction, customer service automation, jobs where mid-tier quality is enough and the cost gap does the heavy lifting. The example in the draft: a company pushing 100 million tokens a day would spend $15 on V3.5 inputs against $350 on Gemini 3.5 Flash or $500 on GPT-5.5. (Note the Gemini comparison rests on the disputed $0.35 input figure above.) If the pricing held, that's the kind of gap that changes whether a use case is viable at all.

Supporting AI Kick Start editorial image for deepseek-v35-cheapest-open-model-1m-context. — Generated AI Kick Start editorial visual used to explain the article's practical workflow and trade-offs.

The Context Window

The 1-million-token window is the headline feature at this price. No other sub-dollar model is said to offer long context, which is what would make V3.5 useful for jobs like reading an entire book or legal case file in one pass, working through months of customer-support history, or scanning a small-to-medium codebase whole.

Needle-in-a-haystack testing at 1M tokens reportedly shows 93% retrieval accuracy, said to be just under MiniMax M3's 97% and Gemini's 95%, though no source is given for any of these figures and we couldn't confirm them. The model is also said to lose some coherence at the far end of the window, dropping off more noticeably past 600K tokens than rivals do. Again, unverified.

Deployment and Infrastructure

DeepSeek is described as offering V3.5 through its API and as open weights. The open-weights version is said to use a Mixture-of-Experts design with 37 billion active parameters out of 236 billion total. That spec looks scrambled: the real DeepSeek V3 family runs 671B total with 37B active, and 236B was the total for the older V2. The draft compares it to GLM-5.2's "753B dense architecture", but GLM-5.2 is actually a ~744B-total MoE model with around 40B active, not dense, and to MiniMax M3's reported 32B active, which we couldn't confirm either.

The MoE approach would make V3.5 cheaper to run than a dense model of the same strength, but a 1M-token window still wants serious hardware. Self-hosting with full context is said to need roughly 8x H100 GPUs, unverified, and tied to a model we can't confirm exists. Several cloud providers reportedly host it, with Together AI and Fireworks named as competitively priced; both are real DeepSeek hosts.

The Hidden Costs of Cheap AI

Cheap raises questions about where the savings come from. DeepSeek hasn't published much about its training compute costs or the data behind V3.5. Its training mix is known to lean heavily on Chinese-language content, which can leak odd biases into English output.

Speed is the other catch. V3.5's generation rate is reportedly about 95 tokens per second, fine for most things, but slower than Gemini 3.5 Flash's claimed 180. Neither figure is sourced, and for anything real-time that gap would matter if it's accurate.

The bigger issue is regulatory. DeepSeek is a Chinese company, and that's a live risk for enterprise buyers. Washington hasn't moved against DeepSeek models specifically, but the recent Fable 5 ban, where the US ordered Anthropic to cut off access to Fable 5 and Mythos 5 for foreign nationals on national-security grounds, shows Chinese-linked AI can become a target fast. If you're putting critical infrastructure on a model like this, that's worth pricing in.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

DeepSeek API documentation

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: DeepSeek V3.5: The $0.15 per Million Token Model That Disrupts Pricing

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call