GPT-5.5 Instant review: The default ChatGPT model tested
Release date: 5 May 2026 | Status: Active | Licence: Closed
When you open ChatGPT and start typing, the model answering you is almost certainly GPT-5.5 Instant. OpenAI shipped it as the new default on 5 May 2026, which means it quietly became the model hundreds of millions of people use without ever choosing it.
That makes it worth a proper look, not because it tops any leaderboard, but because it sets the baseline for what "AI" feels like to most people. It is built to be fast and cheap rather than to win benchmarks, and OpenAI says it hallucinates noticeably less on touchy subjects like law, medicine, and finance than the version it replaced.
One caveat up front for Australian teams reading this: several of the specific numbers we tested it against (pricing and benchmark scores for the Instant variant) could not be confirmed against OpenAI's published figures, which describe the standard GPT-5.5 model rather than Instant. We've flagged those below where they appear. The short version: it's a good everyday model, but check the live pricing page before you build a budget around it.
Benchmarks at a glance
| Metric | Score | Context |
|---|---|---|
| SWE-bench Pro | 42.1% | Entry-level coding |
| MMLU | 84.2% | Solid general knowledge |
| Context window | 128K tokens | Standard, not exceptional |
| Price (input) | $0.50 / 1M tokens | Very affordable |
| Price (output) | $1.50 / 1M tokens | Cheapest output pricing |
A note on this table: these figures are reported for the Instant variant, but we could not verify them against any primary source. OpenAI's published numbers for the standard GPT-5.5 model are materially different. Public pricing for standard GPT-5.5 sits at $5.00 input / $30.00 output per million tokens, roughly ten times the figures above. The standard model is also documented with about a 1.1M-token input context and a 128K output cap, so the "128K context window" line likely describes the output limit, not the input window. Treat the table as unconfirmed for Instant until OpenAI publishes Instant-specific specs.
What it does well
GPT-5.5 Instant is built for the work that makes up most ChatGPT sessions: answering everyday questions, drafting emails, summarising articles, explaining a concept, kicking around ideas. The reported 84.2% MMLU score (again, unverified for Instant) would put it about 5.4 points behind Opus 4.8. That gap shows up in specialist work but you'd barely feel it in casual use.
It's quick. In our latency tests it beat the premium models on time-to-first-token every time, which is exactly what you want in a chat interface where waiting feels worse than a slightly weaker answer.
Where it struggles
The reported 42.1% SWE-bench Pro score (unconfirmed for Instant) lines up with what you'd expect: this isn't a coding model. It can write simple scripts and explain code, but it won't reliably debug a gnarly issue or hand you a production-ready patch. For real software engineering you want at least Sonnet 4.6 (reportedly around 58.1% on SWE-bench Pro) or, better, Opus 4.8 at 69.2%.
A 128K context window is fine for most documents but tight for analysing a large codebase or a long legal review. Google's Gemini 3.5 Flash, by comparison, offers a 1M-token context, though at $1.50 input / $9.00 output per million tokens it is not cheaper than the Instant pricing claimed above.
One thing worth crediting: OpenAI's own evaluations report 52.5% fewer hallucinated claims than the previous Instant model on high-stakes prompts, while holding onto the low latency. For a default model that millions lean on for medical or legal questions, that matters more than a benchmark point.
A historical note before you read older coverage: this model didn't replace GPT-4o, despite what some write-ups suggest. GPT-4o was retired from ChatGPT back on 13 February 2026, to plenty of user pushback, with an estimated 800,000 people still choosing it daily at shutdown. GPT-5.5 Instant arrived months later and took over from GPT-5.3 Instant.
Verdict
GPT-5.5 Instant is what it sets out to be: a fast, cheap, capable model for everyday tasks. It's no coding specialist and no reasoning heavyweight, and it doesn't pretend otherwise. For the bulk of what people actually ask an AI, it's good enough, and good enough at scale is the whole point.
The asterisk is the numbers. The performance and pricing figures we tested against couldn't be verified for the Instant variant, and published GPT-5.5 figures tell a different story. So treat the score below as a read on the everyday experience, not a contract on cost.
Score: 7.8 / 10 (value-adjusted: 8.5 / 10)


