Gemini 3.5 Flash review: Frontier performance at Flash pricing
Release date: 19 May 2026 | Status: Active | Licence: Closed
Analysis
If you run AI features inside a business, the question is rarely "which model wins the leaderboard." It's "what can I run a lot of, cheaply, without the output falling apart." That is the gap Gemini 3.5 Flash is built to fill.
Google shipped it on 19 May 2026 at Google I/O, and made it available straight away across the Gemini API, AI Studio, Vertex AI, and the Gemini app (LLM-Stats, Gemini 3.5 Flash launch specs). The headline feature is a 1M-token context window: you can feed it a whole contract bundle or a large codebase in one go, which smaller-context models simply can't do.
A word of caution before the numbers. Some of the early coverage built its whole "unbeatable value" case on a price of $0.35 input / $0.70 output per million tokens. That figure does not hold up. Google's published standard-tier pricing is $1.50 per million input tokens and $9.00 per million output tokens (cached input around $0.15), per LLM-Stats. That's roughly four times higher on input and over ten times higher on output than the cheap number doing the rounds. Flash is still affordable for its tier, but it isn't the giveaway some reviews claimed, and any cost comparison built on the lower figure falls apart.
So treat this review as two things at once: a genuinely capable model worth testing, and a reminder to check the price page yourself before you build a budget around a blog post.
Benchmarks at a glance
A note on the table below: the context window and the release facts are confirmed. The benchmark scores and the lowest price line come from the original write-up and could not be verified against Google's launch materials or the main aggregators, so read them as the author's claimed figures rather than settled fact.
| Metric | Score | Price Context |
|---|---|---|
| SWE-bench Pro | 48.2% (reported; aggregators list ~55.1%) | Competitive at this price |
| MMLU | 86.8% (unconfirmed) | Only 0.8 pts behind GPT-5.5 (unconfirmed) |
| Context window | 1M tokens | Best-in-class |
| Price (input) | Officially $1.50 / 1M tokens (some reviews cite $0.35) | , |
| Price (output) | Officially $9.00 / 1M tokens (some reviews cite $0.70) | , |
The verified spec to anchor on is the 1M-token context window (1,048,576 input tokens, 64K output). Everything price- and score-related below carries more uncertainty.
The value equation
Here's where the original review overreached. It argued no other model touches Flash on price-to-performance, then ran the maths off the $0.35/$0.70 figure: less than half what a "DeepSeek V3.5" charges, one-seventh of GPT-5.5 Instant, one-seventeenth of Sonnet 4.6.
With the real $1.50/$9.00 pricing, that arithmetic collapses. Claude Sonnet 4.6 is confirmed at $3 input / $15 output per million tokens (Anthropic API pricing 2026), so against verified numbers Flash is roughly half the input cost and a bit over half the output cost of Sonnet, not one-seventeenth. The DeepSeek comparison is shakier still: current 2026 coverage points to DeepSeek V4 (a V4 Flash tier reportedly around $0.14 input / $0.28 output), and the "V3.5" at $0.15/$0.60 with 85.8% MMLU cited here could not be confirmed (2026 LLM API pricing comparison).
What does survive is the practical point underneath the bad maths: the 1M context window changes what you can attempt. Flash can ingest documents and codebases that 128K models can't even load, and at its real price that's still a reasonable rate for high-volume, large-input work.
Where Flash excels
Document analysis. The big context window plus a sane price makes Flash a strong default for RAG-style apps, legal document review, and large-scale content analysis. Feeding it a million tokens of input is the kind of workload that used to be too expensive to bother with; at Flash's real rate it becomes worth costing out properly.
General knowledge. The original review put Flash at 86.8% MMLU, ahead of GPT-5.5 Instant (84.2%) and 1.6 points behind GPT-5.5 (88.4%). Worth flagging: none of those MMLU figures could be confirmed, Google's launch coverage leaned on coding and agentic benchmarks (Terminal-Bench, MCP Atlas, Finance Agent) rather than MMLU, and the GPT-5.5 scores are unverified too. The general takeaway still stands directionally: for Q&A, summarising, and content generation, a model in this tier is usually good enough that small benchmark gaps don't show up in the work.
Multilingual tasks. As with the rest of the Gemini line, Flash is reportedly strong across languages, especially Asian and European ones, and tends to beat English-centric models on non-English tests. Useful if your audience isn't all in English.
Where it falls short
Complex coding. On SWE-bench Pro the review cited 48.2%, well below Sonnet 4.6 (reported 58.1%) and Opus 4.8 (reported 69.2%). Two caveats: aggregators actually list Flash closer to 55.1% on that benchmark, and the Sonnet/Opus figures couldn't be confirmed. Either way, the pattern is believable, Flash handles routine coding fine but gets stretched by multi-file changes, gnarly debugging, and genuinely novel algorithm work. For that tier of task, reach for a heavier model.
Reasoning depth. On harder reasoning tests like ARC-AGI-2, Flash reportedly trails models that score higher on knowledge benchmarks, and is less reliable at multi-step deduction and abstract pattern work. Keep it away from problems that need long chains of careful logic.
Verdict
Gemini 3.5 Flash is a capable, affordable model with a standout context window, and it's worth shortlisting if you're building features that chew through large inputs at volume. What it is *not* is the no-brainer "best value in June 2026" some reviews declared, that verdict was built on a price ($0.35/$0.70) that isn't real. At the actual $1.50/$9.00, several Flash- and Lite-tier models and DeepSeek-class options come in cheaper, so the superlative doesn't hold (LLM-Stats).
The honest summary: Flash isn't the best at any one thing, but it's solid at most things, and the context window earns its keep. Just price your workload against Google's published rates, check the Gemini Flash page, before you commit a budget to it.


