OpenAI Introduces GPT-5.5: Smarter and More Autonomous

In Brief

OpenAI released GPT-5.5 on April 24, 2026, calling it their smartest model so far. The headline claim is not just a higher intelligence score. It is that GPT-5.5 can take a messy, multi-part task and plan it, use tools, check its own work, and keep going until it is finished, instead of stopping after every step to ask what to do next.

The model is rolling out now to paying ChatGPT users (Plus, Pro, Business, Enterprise) and inside Codex, OpenAI's coding environment. A more powerful variant called GPT-5.5 Pro is available to Pro, Business, and Enterprise users. The API will follow soon. Pricing in the API starts at $5 per million input tokens.

What's new

OpenAI is positioning GPT-5.5 less as "ChatGPT but smarter" and more as a model built to do work on a computer. The release post groups the gains into three areas worth taking one at a time.

Coding that finishes the job

GPT-5.5 is OpenAI's strongest coding model so far. On Terminal-Bench 2.0, a test that measures how well a model handles complex command-line workflows (the typing-only interface developers use to control a computer), GPT-5.5 scores 82.7%, up from 75.1% for GPT-5.4. On SWE-Bench Pro, which evaluates real GitHub issue resolution, it reaches 58.6%.

Numbers aside, what early testers describe is a different kind of help. Senior engineers said GPT-5.5 catches problems before they happen, predicts what tests and reviews will be needed, and stays on a task much longer without giving up. One engineer at NVIDIA said losing access to it "feels like I've had a limb amputated."

A practical example from the announcement: an engineer asked GPT-5.5 to re-architect the comment system in a collaborative editor. The model came back with a stack of twelve code changes that was "nearly complete." That kind of multi-file, multi-step work has been the weak spot for AI coding assistants until now.

Computer-based work, not just chatting

The same skills that make GPT-5.5 good at coding also make it useful for everyday knowledge work: finding information, building documents and spreadsheets, navigating between apps, checking results, and turning rough inputs into something usable.

OpenAI says more than 85% of its own employees use Codex every week, in roles that include finance, marketing, and communications. Examples in the launch post: the finance team used it to review 24,771 tax forms (71,637 pages), and a sales employee automated weekly business reports, saving 5-10 hours a week.

In ChatGPT itself, the new GPT-5.5 Pro variant is aimed at harder questions where accuracy and depth matter most: business analysis, legal work, education, and data science.

Scientific research

The most striking example in the announcement is mathematical. An internal version of GPT-5.5 helped find a new proof about Ramsey numbers, an old problem in combinatorics, the branch of mathematics that studies how discrete objects fit together. The proof was later verified in Lean, a programming language used to check mathematical proofs by computer.

OpenAI also reports gains on benchmarks for biology and bioinformatics work, where GPT-5.5 can apparently take on multi-day analysis tasks that would normally require expert scientists. The framing OpenAI uses is that the model is now strong enough to act as a "co-scientist."

Same speed, fewer tokens

Bigger AI models usually get slower to respond as they get smarter. GPT-5.5 is the unusual case: OpenAI says it matches GPT-5.4's per-token latency (the time you wait between sending a prompt and getting a reply) while operating at a much higher level of intelligence. It also uses significantly fewer tokens (the small text-piece units the model reads and writes) to finish the same Codex tasks.

A lot of this comes from hardware. GPT-5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems, large racks of AI-specific chips. OpenAI also says the model and Codex were used to improve the very infrastructure that runs them, including custom load-balancing code that increased token generation speeds by over 20%.

The takeaway: more capability per dollar. On the Artificial Analysis Coding Index, an external benchmark, GPT-5.5 delivers state-of-the-art performance at roughly half the cost of competing frontier coding models.

Stronger cybersecurity controls

OpenAI is treating GPT-5.5's biological/chemical and cybersecurity capabilities as High under their Preparedness Framework, their internal grading system for how risky a model's capabilities could be. High is one step below Critical, so the model did not cross the most serious threshold, but its cyber abilities are clearly stronger than GPT-5.4's.

The response is two-sided. On one hand, OpenAI is deploying tighter classifiers (automated filters) for sensitive cyber requests, with extra protections against repeated misuse. Some users will hit refusals they did not see before.

On the other hand, OpenAI is expanding access for verified defenders. A new program called Trusted Access for Cyber lets security teams apply for less-restricted models when working on legitimate defensive tasks. Critical-infrastructure operators (power grids, water systems, public records) can also apply for an even more permissive model called GPT-5.4-Cyber.

Pricing and availability

In ChatGPT and Codex, GPT-5.5 is included in paid plans. Codex now ships with a 400,000-token context window, meaning the model can keep that many tokens (roughly the length of a long novel) in mind at once.

In the API (the interface developers use to send requests programmatically), GPT-5.5 will be priced at $5 per million input tokens and $30 per million output tokens, with a 1,000,000-token context window. GPT-5.5 Pro is significantly more expensive at $30 per million input and $180 per million output tokens, aimed at workloads where accuracy is worth the price difference. Batch and Flex pricing are available at half the standard rate.

The bigger picture

GPT-5.5 itself is an incremental step from GPT-5.4. The interesting shift is what OpenAI emphasizes in the announcement: the model is sold less as a question-answering tool and more as something that can take action on a computer over time. Plan, use tools, check, retry, finish.

That is the direction the whole industry has been pointing toward for the last year, what people now call agentic AI. The tradeoffs are also familiar: more autonomy means more capability for legitimate work, more risk if misused, and more pressure on the safeguards. With GPT-5.5, OpenAI is leaning in on all three at once.

Glossary

Term	Definition
Agentic AI	AI that plans, uses tools, and finishes a task on its own, instead of just answering one question at a time.
Token	The text-piece unit a language model reads and writes. A token is roughly a few characters or a short word.
Context window	How many tokens a model can keep in mind at once during a single conversation or task.
Latency	How long you wait between sending a prompt and getting an answer back.
Codex	OpenAI's coding-focused product, where their models do hands-on engineering work like writing, debugging, and refactoring code.
Benchmark	A standardized test that measures how well a model performs on a defined task, used to compare models against each other.
Preparedness Framework	OpenAI's internal grading system for how risky a model's capabilities are, with levels including High and Critical.