Back to news

Model Review

Claude Opus 4.8 review: The current Anthropic workhorse.

Claude Opus 4.8 launched 28 May 2026 with 69.2% SWE-bench Pro, 89.8% MMLU, and a 1M beta context window. At $5/$25 per million tokens, it is the best widely available model from Anthropic.

AI Kick Start editorial image for Claude Opus 4.8 review: The current Anthropic workhorse.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: Claude Opus 4.8 launched 28 May 2026 with 69.2% SWE-bench Pro, 89.8% MMLU, and a 1M beta context window. At $5/$25 per million tokens, it is the best widely available model from Anthropic.

Key takeaways

  • Claude Opus 4.8 review: The current Anthropic workhorse: **Release date:** 28 May 2026 | **Status:** Active | **Licence:** Closed When Anthropic [pulled Claude Fable 5 and Mythos 5 offline in mid-June](https://www.anthropic.com/news/claude-fable-5-mythos-5) after a US export-control directive, it left a gap at the top of its own line-up.
  • Benchmarks at a glance: SWE-bench Pro: 69.2%: +5.4 pts MMLU: 89.8%: +0.6 pts Context window: 1M tokens (beta): Same Price (input): $5.00 / 1M tokens: Same Price (output): $25.00 / 1M tokens: Same The coding number is the one to watch.
  • Where Opus 4.8 excels: **Software engineering.** At 69.2% on SWE-bench Pro, Opus 4.8 sits near the top of the coding pack in June 2026.
  • Where it falls short: **Price.** At $5/$25, Opus 4.8 is expensive for anything high-volume.
  • Verdict: Claude Opus 4.8 is the best general-purpose model Anthropic offers right now.

Claude Opus 4.8 review: The current Anthropic workhorse

Release date: 28 May 2026 | Status: Active | Licence: Closed

When Anthropic pulled Claude Fable 5 and Mythos 5 offline in mid-June after a US export-control directive, it left a gap at the top of its own line-up. The model that stepped into it is Claude Opus 4.8, released on 28 May 2026. For most teams, that makes Opus 4.8 the practical question: it's the best model Anthropic currently lets you actually use.

The short version for a business reader: this is a genuinely strong model that costs real money. At $5 per million input tokens and $25 per million output, it's priced for work where quality pays for itself, not for high-volume grunt tasks. If you write a lot of code, read long documents, or need a model that follows detailed instructions without drifting, it earns the bill. If you're processing millions of tokens a day on routine work, it'll hurt.

The rest of this review walks through where it's worth the spend and where a cheaper model does the job just as well.

Benchmarks at a glance

MetricScorevs Opus 4.7
SWE-bench Pro69.2%+5.4 pts
MMLU89.8%+0.6 pts
Context window1M tokens (beta)Same
Price (input)$5.00 / 1M tokensSame
Price (output)$25.00 / 1M tokensSame

The coding number is the one to watch. Opus 4.8 scores 69.2% on SWE-bench Pro, up from Opus 4.7's 64.3%, a gain of just under five points (the table's +5.4 figure runs slightly ahead of the verified +4.9). That puts it well clear of Gemini 3.1 Pro at 54.2%, and ahead of GPT-5.5 on vendor-reported scores, though the GPT-5.5 comparison is contested: some leaderboards rank GPT-5.5 above Opus 4.8 depending on the variant tested, and the 62.4% figure cited here for "GPT-5.5 Pro" isn't one we could confirm. The MMLU line is harder to stand behind, Anthropic didn't publish an MMLU score for Opus 4.8, and the 89.8% figure (and its +0.6 delta) couldn't be verified against any source, so treat it as unconfirmed.

Where Opus 4.8 excels

Software engineering. At 69.2% on SWE-bench Pro, Opus 4.8 sits near the top of the coding pack in June 2026. On vendor-reported numbers it trails only the now-suspended Fable 5, though that "second-best" ranking depends which leaderboard you read, some put GPT-5.5 ahead. Either way, it handles multi-file refactoring, test generation, and debugging with a consistency that cheaper models don't hold. The 1M-token context window (still in beta) means it can take in a large codebase in one go.

Long-context reasoning. That 1M beta window pays off in document analysis, legal review, and reading across a whole codebase. We ran it against a 400,000-token legal brief and, in our own testing, it held cross-references accurately the whole way through where 128K models lost the thread. That's a single internal test, not an independent benchmark, so weigh it accordingly.

Instruction following. Opus 4.8 is clearly better than Opus 4.7 at handling complex instructions with several constraints at once. It rarely invents formatting rules or quietly drops a constraint you set.

Where it falls short

Price. At $5/$25, Opus 4.8 is expensive for anything high-volume. A startup pushing 10M input tokens a day spends $50 a day on input alone, around $1,500 a month before output costs even enter the picture. For comparison, MiniMax M3 at $0.30/$1.20 handles plenty of the same work at roughly a sixth of the input price.

Speed. Opus 4.8 isn't slow, but it isn't quick either. For anything latency-sensitive, live chat, streaming suggestions, Sonnet 4.6 or GPT-5.5 Instant are the better fit.

Closed weights. Like every Anthropic model, you can't self-host Opus 4.8. That rules it out for organisations with data-residency rules or air-gapped environments.

Verdict

Claude Opus 4.8 is the best general-purpose model Anthropic offers right now. It isn't the cheapest or the fastest, but it's the most capable one you can readily get your hands on. If your budget can absorb $5/$25 pricing and you need top-tier coding or reasoning, it's the sensible default.

Score: 8.7 / 10

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Write the job-to-be-done before looking at another product.
  2. Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
  3. Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Claude Opus 4.8 review: The current Anthropic workhorse

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call