GPT-5.4: OpenAI's New All-in-One Frontier Model

Key insights
- GPT-5.4 combines GPT-5.2's general knowledge and GPT-5.3 Codex's coding ability into one model, matching the unified approach Anthropic took with Opus 4.6
- Early testers report strong coding and agentic performance, but note that front-end design quality still trails Opus 4.6 and Gemini 3.1 Pro
- Pricing has increased compared to GPT-5.2, with input tokens at $2.50 per million and output at $15 per million for the standard tier
This article is a summary of OpenAI just dropped GPT-5.4 and WOW..... Watch the video โ
Read this article in norsk
In Brief
OpenAI released GPT-5.4, a new frontier model (the most advanced tier of AI) that merges the coding capabilities of GPT-5.3 Codex with GPT-5.2's strengths in creative writing, personality, and general knowledge. YouTube creator Matthew Berman, who had early access for a week, describes it as a single model that can handle coding, computer use, document work, and agentic tasks (AI that takes actions independently using tools). The model comes in two variants: 5.4 Thinking and 5.4 Pro, and includes a 1 million token context window, matching what Anthropic's Claude models already offer.
What happened
OpenAI shipped GPT-5.4 on March 6, 2026, making it the company's first model that handles both coding and general-purpose tasks at frontier level. Previously, users had to choose between GPT-5.2 for writing and conversation or GPT-5.3 Codex for code. GPT-5.4 rolls both into one (2:08).
Berman compares this directly to Anthropic's approach with Opus 4.6, which already combined strong coding with broad knowledge in a single model (0:46). With 5.4, OpenAI closes that gap.
The model also introduces an upfront planning feature for the Thinking variant. Instead of immediately generating code or text, it can outline its approach first (7:22). This is similar to a feature already popular in coding tools like Cursor.
Benchmark highlights
| Benchmark | GPT-5.4 Thinking | Comparison |
|---|---|---|
| OSWorld (computer use) | 75% | Opus 4.6: 72.7% (4:02) |
| SWE-bench Pro (coding) | 57.7% | Gemini 3.1 Pro: 54.2% (4:16) |
| GDPval (knowledge work) | 83% | Opus 4.6: 78% (5:04) |
Pricing: higher cost for frontier capability
The new model costs more than its predecessors. GPT-5.4 runs $2.50 per million input tokens compared to $1.75 for GPT-5.2, a 43% increase (11:22). Output pricing sits at $15 per million tokens. The Pro variant is significantly more expensive at $30 per million input tokens and $180 per million output tokens.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-5.2 | $1.75 | $14 |
| GPT-5.4 | $2.50 | $15 |
| GPT-5.2 Pro | $21 | $168 |
| GPT-5.4 Pro | $30 | $180 |
Input costs can be reduced through caching (reusing previously processed text), but output pricing remains fixed regardless.
Early reactions
Several early testers shared their impressions. Matt Shumer called it "the best model on the planet by far" and said 5.4 Thinking covered all his use cases, making Pro models unnecessary (14:18). He described the coding capabilities as "essentially flawless," though Berman notes this is likely an overstatement.
Shumer also flagged weaknesses. Front-end design quality reportedly trails both Opus 4.6 and Google's Gemini 3.1 Pro (14:51). He also found the model stopped short of completing tasks when used inside the OpenClaw agent framework (15:14). OpenAI CEO Sam Altman reportedly responded that these issues would be fixed quickly.
Flavio Adamo, another early tester, said the model completed tasks within Codex that previous models found too time-consuming (15:43). Peter Steinberger, now an OpenAI employee, described it as "a better general purpose agent" that writes better documentation, though he noted the coding-specific improvement is smaller than the jump from GPT-5.0 to 5.1 (16:02).
What we are tracking next
- Whether the front-end design gap with Opus 4.6 narrows in future updates.
- How GPT-5.4 performs as the primary model in agent frameworks like OpenClaw over time.
- Whether pricing pressure from competitors forces adjustments, as frontier models continue getting more expensive.
Glossary
| Term | Definition |
|---|---|
| Frontier model | The most capable and advanced AI model a company offers. Usually also the most expensive. |
| Context window | The maximum amount of text an AI model can process in a single conversation. 1 million tokens is roughly 750,000 words. |
| Token | The smallest unit of text an AI model works with. Roughly 3-4 characters or about three-quarters of a word. |
| Agentic | AI that can act independently by using tools, browsing the web, or controlling a computer rather than just generating text. |
| Benchmark | A standardized test used to compare AI model performance. Different benchmarks measure different skills. |
| SWE-bench | A benchmark that tests how well AI can fix real software bugs pulled from GitHub repositories. |
| OSWorld | A benchmark that measures how accurately an AI can operate a full computer operating system. |
| GDPval | OpenAI's benchmark measuring how well models complete real knowledge work tasks like spreadsheets, documents, and presentations. |
| Caching | Reusing previously processed input to save time and cost. Like pre-cooking ingredients so later meals go faster. |
Sources and resources
Want to go deeper? Watch the full video on YouTube โ