Claude Sonnet 4.6 review: Opus-level intelligence at half the price
Release date: 17 February 2026 | Status: Active | Licence: Closed
On 17 February 2026, Anthropic shipped Claude Sonnet 4.6, and the pitch is simple: most of the smarts of its top model for a lot less money. For business teams already paying per token, that pitch lands where it matters.
The model sits in the middle of Anthropic's range, between the premium Opus line and the cheaper Haiku versions. It runs at $3.00 input / $15.00 output per million tokens, which works out to 40% cheaper than Opus 4.8 (CloudZero, Claude Opus 4.8 pricing). The headline "half the price" is loose marketing; against Opus 4.8 the real number is 40%, though against the older premium Opus tier it gets closer to one-fifth (VentureBeat, Sonnet 4.6 at one-fifth the cost).
The "so what" for a business team: for general knowledge work, the gap between Sonnet and Opus is small enough that you probably won't notice it. For heavy coding, the gap is real. The rest of this review walks through where each is true.
Benchmarks at a glance
| Metric | Sonnet 4.6 | Opus 4.8 | Delta |
|---|---|---|---|
| SWE-bench Pro | 58.1% | 69.2% | -11.1 pts |
| MMLU | 87.6% | 89.8% | -2.2 pts |
| Context window | 1M (beta) | 1M (beta) | , |
| Price (input) | $3.00 / 1M | $5.00 / 1M | -40% |
| Price (output) | $15.00 / 1M | $25.00 / 1M | -40% |
A caveat on the coding row. Opus 4.8's 69.2% on SWE-bench Pro checks out against the public leaderboard. The 58.1% figure for Sonnet 4.6 is harder to stand behind: Anthropic reports Sonnet 4.6 on SWE-bench Verified (around 79.6%), not SWE-bench Pro, and no Pro score for the model appears anywhere we could find. Treat that delta as indicative, not gospel. The MMLU numbers are close to plausible figures floating around in comparison data (LLM-Stats, Sonnet 4.6 vs Opus 4.8), but the exact paired values aren't confirmed by a primary source.
Where Sonnet 4.6 shines
Value for money. A 2.2-point MMLU gap means Sonnet 4.6 knows nearly as much as Opus 4.8 for general Q&A, document analysis, and summarisation. On a lot of production work, you'd be hard pressed to tell which model wrote the answer.
Speed. In our testing, Sonnet 4.6 returns first tokens faster than Opus 4.8 and pushes more throughput. That said, these are our own observations rather than independently verified numbers. Smaller Claude models tend to be quicker than Opus, so the direction tracks with Anthropic's own positioning. It suits real-time apps and high-volume jobs where latency adds up.
Context window. Anthropic says Sonnet 4.6 includes a 1M-token context window in beta, matching Opus. Worth knowing: at least one aggregator lists the default input window at 200K, so the 1M figure looks like a beta or opt-in tier rather than the standard setting. With that caveat, it opens up large-document analysis and whole-codebase reading that used to mean reaching for the top tier.
Where it lags
Complex coding. The roughly 11-point SWE-bench gap is the part you feel. Sonnet 4.6 handles routine coding fine: boilerplate, simple debugging, documentation. It gets shakier on multi-file refactors, gnarly algorithmic problems, and vague specs. If serious software engineering is the job, Opus 4.8 earns its premium.
Reasoning depth. On harder reasoning tasks, Sonnet 4.6 reportedly slips further behind Opus 4.8 than the MMLU gap implies, and looks less dependable on multi-step deduction. We'll flag this as unconfirmed: no published ARC-AGI-2 scores for the Sonnet 4.6 / Opus 4.8 pairing exist, so this read is directional rather than measured.
The sweet spot
Sonnet 4.6 fits customer support chatbots, document summarisation, content moderation, basic code review, and anything where speed and cost beat squeezing out the last drop of reasoning. It's Anthropic's best-balanced model.
Verdict
For most Anthropic users, Sonnet 4.6 is the sensible default. Unless you genuinely need the best coding performance available, the 40% saving outweighs the capability you give up. It's the model we'd reach for first on new Anthropic integrations.
Score: 8.4 / 10 (our editorial rating, not a benchmarked figure)


