GPT-5.4: OpenAI's New All-in-One Frontier Model

Key insights

GPT-5.4 combines GPT-5.2's general knowledge and GPT-5.3 Codex's coding ability into one model, matching the unified approach Anthropic took with Opus 4.6
Early testers report strong coding and agentic performance, but note that front-end design quality still trails Opus 4.6 and Gemini 3.1 Pro
Pricing has increased compared to GPT-5.2, with input tokens at $2.50 per million and output at $15 per million for the standard tier

SourceYouTube

Published March 6, 2026

Matthew Berman

Hosts:Matthew Berman

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

OpenAI released GPT-5.4, a new frontier model (the most advanced tier of AI) that merges the coding capabilities of GPT-5.3 Codex with GPT-5.2's strengths in creative writing, personality, and general knowledge. YouTube creator Matthew Berman, who had early access for a week, describes it as a single model that can handle coding, computer use, document work, and agentic tasks (AI that takes actions independently using tools). The model comes in two variants: 5.4 Thinking and 5.4 Pro, and includes a 1 million token context window, matching what Anthropic's Claude models already offer.

What happened

OpenAI shipped GPT-5.4 on March 6, 2026, making it the company's first model that handles both coding and general-purpose tasks at frontier level. Previously, users had to choose between GPT-5.2 for writing and conversation or GPT-5.3 Codex for code. GPT-5.4 rolls both into one (2:08).

Berman compares this directly to Anthropic's approach with Opus 4.6, which already combined strong coding with broad knowledge in a single model (0:46). With 5.4, OpenAI closes that gap.

The model also introduces an upfront planning feature for the Thinking variant. Instead of immediately generating code or text, it can outline its approach first (7:22). This is similar to a feature already popular in coding tools like Cursor.

Benchmark highlights

Benchmark	GPT-5.4 Thinking	Comparison
OSWorld (computer use)	75%	Opus 4.6: 72.7% (4:02)
SWE-bench Pro (coding)	57.7%	Gemini 3.1 Pro: 54.2% (4:16)
GDPval (knowledge work)	83%	Opus 4.6: 78% (5:04)

Pricing: higher cost for frontier capability

The new model costs more than its predecessors. GPT-5.4 runs $2.50 per million input tokens compared to $1.75 for GPT-5.2, a 43% increase (11:22). Output pricing sits at $15 per million tokens. The Pro variant is significantly more expensive at $30 per million input tokens and $180 per million output tokens.

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-5.2	$1.75	$14
GPT-5.4	$2.50	$15
GPT-5.2 Pro	$21	$168
GPT-5.4 Pro	$30	$180

Input costs can be reduced through caching (reusing previously processed text), but output pricing remains fixed regardless.

Early reactions

Several early testers shared their impressions. Matt Shumer called it "the best model on the planet by far" and said 5.4 Thinking covered all his use cases, making Pro models unnecessary (14:18). He described the coding capabilities as "essentially flawless," though Berman notes this is likely an overstatement.

Shumer also flagged weaknesses. Front-end design quality reportedly trails both Opus 4.6 and Google's Gemini 3.1 Pro (14:51). He also found the model stopped short of completing tasks when used inside the OpenClaw agent framework (15:14). OpenAI CEO Sam Altman reportedly responded that these issues would be fixed quickly.

Flavio Adamo, another early tester, said the model completed tasks within Codex that previous models found too time-consuming (15:43). Peter Steinberger, now an OpenAI employee, described it as "a better general purpose agent" that writes better documentation, though he noted the coding-specific improvement is smaller than the jump from GPT-5.0 to 5.1 (16:02).

Glossary

Term	Definition
Frontier model	The most capable and advanced AI model a company offers. Usually also the most expensive.
Context window	The maximum amount of text an AI model can process in a single conversation. 1 million tokens is roughly 750,000 words.
Token	The smallest unit of text an AI model works with. Roughly 3-4 characters or about three-quarters of a word.
Agentic	AI that can act independently by using tools, browsing the web, or controlling a computer rather than just generating text.
Benchmark	A standardized test used to compare AI model performance. Different benchmarks measure different skills.
SWE-bench	A benchmark that tests how well AI can fix real software bugs pulled from GitHub repositories.
OSWorld	A benchmark that measures how accurately an AI can operate a full computer operating system.
GDPval	OpenAI's benchmark measuring how well models complete real knowledge work tasks like spreadsheets, documents, and presentations.
Caching	Reusing previously processed input to save time and cost. Like pre-cooking ingredients so later meals go faster.