Analysis
A note before you read on. We went looking for the model this article is about and could not confirm it exists. DeepSeek's own changelog lists V3.2 in December 2025, then the V4 family in April 2026, no "V3.5", and no 20 March 2026 release. Several of the prices and benchmark figures quoted here also don't line up with any DeepSeek model we can find documentation for.
So read this as a report on a set of circulating claims about a budget Chinese model, not as a spec sheet you can buy against. We've kept every number the original draft carried, but flagged the unconfirmed ones as exactly that. Where a fact does check out, who funds DeepSeek, what the big rivals actually charge, the recent US export action, we've linked the source.
Why bother running it at all? Because the underlying story is real and worth understanding. DeepSeek has spent two years undercutting everyone else on price, and a sub-dollar model with a million-token window would genuinely shift the maths for high-volume work. If a model like the one described below ships and the pricing is anywhere near accurate, plenty of Australian teams will want to know what they'd be trading away to get it.
DeepSeek has built its name on one thing: being cheap. The Chinese lab is funded by the quantitative trading firm High-Flyer), and its models have repeatedly come in 5 to 10 times under competitors while staying usable. The reportedly-released V3.5 is described as that strategy pushed to the limit.
At a claimed $0.15 per million input tokens and $0.60 per million output tokens, V3.5 would be the cheapest model from any major lab. On those figures it's pitched as 23x cheaper than GPT-5.5 ($5/$30) and 33x cheaper than Claude Opus 4.8 ($5/$25), and those two competitor prices do check out. The draft also claims it's 2x cheaper than Gemini 3.5 Flash at $0.35/$0.70; that Gemini figure looks wrong, since Gemini 3.5 Flash is documented at $1.50/$9.00 per million tokens, not $0.35/$0.70. And unlike most budget models, V3.5 is said to ship a 1-million-token context window, the same range as MiniMax M3, Gemini 3.5 Flash, and Gemini 3.1 Pro, which do all run 1M context.
What You Get for the Price
The benchmark scores quoted for V3.5 sit where you'd expect a budget model to land, and none of them could be verified against a real model. MMLU-Pro: 76.8%. HumanEval: 81.4%. MATH: 63.2%. Not headline numbers, but mid-tier, in the range of models said to cost 5 to 10 times more. SWE-bench: 48.7%, which on paper means routine coding is fine but anything genuinely hard in software engineering will trip it up. Worth noting: the real DeepSeek line reportedly scores higher than this, so these figures may describe nothing that shipped.
The pitch is that V3.5 earns its keep where volume matters more than peak smarts. Content moderation, document classification, data extraction, customer service automation, jobs where mid-tier quality is enough and the cost gap does the heavy lifting. The example in the draft: a company pushing 100 million tokens a day would spend $15 on V3.5 inputs against $350 on Gemini 3.5 Flash or $500 on GPT-5.5. (Note the Gemini comparison rests on the disputed $0.35 input figure above.) If the pricing held, that's the kind of gap that changes whether a use case is viable at all.

The Context Window
The 1-million-token window is the headline feature at this price. No other sub-dollar model is said to offer long context, which is what would make V3.5 useful for jobs like reading an entire book or legal case file in one pass, working through months of customer-support history, or scanning a small-to-medium codebase whole.
Needle-in-a-haystack testing at 1M tokens reportedly shows 93% retrieval accuracy, said to be just under MiniMax M3's 97% and Gemini's 95%, though no source is given for any of these figures and we couldn't confirm them. The model is also said to lose some coherence at the far end of the window, dropping off more noticeably past 600K tokens than rivals do. Again, unverified.
Deployment and Infrastructure
DeepSeek is described as offering V3.5 through its API and as open weights. The open-weights version is said to use a Mixture-of-Experts design with 37 billion active parameters out of 236 billion total. That spec looks scrambled: the real DeepSeek V3 family runs 671B total with 37B active, and 236B was the total for the older V2. The draft compares it to GLM-5.2's "753B dense architecture", but GLM-5.2 is actually a ~744B-total MoE model with around 40B active, not dense, and to MiniMax M3's reported 32B active, which we couldn't confirm either.
The MoE approach would make V3.5 cheaper to run than a dense model of the same strength, but a 1M-token window still wants serious hardware. Self-hosting with full context is said to need roughly 8x H100 GPUs, unverified, and tied to a model we can't confirm exists. Several cloud providers reportedly host it, with Together AI and Fireworks named as competitively priced; both are real DeepSeek hosts.


