OpenAI's GPT 5.4 in 10 Minutes

OpenAI shipped GPT 5.4 and it matters. Not because it tops every benchmark--it doesn't--but because it changes what you can actually do with a model in production.

Two variants landed: GPT 5.4 Thinking and GPT 5.4. The first is the reasoning powerhouse. The second is the fast, capable default. Both have a million tokens of context and a new steerable thinking UX that lets you redirect the model's reasoning mid-response. That last part is new for everyone.

Let's break it down.

Access Tiers

This is where OpenAI's pricing maze gets real.

For model-selection context, compare this with OpenAI Codex: Cloud AI Coding With GPT-5.3 and OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.

GPT 5.4 Thinking is available on ChatGPT Plus ($20/mo), Teams, Pro, and Enterprise. That's the reasoning model most people will use.

GPT 5.4 (the non-thinking variant) is locked to the $200/month Pro tier. If you want both, you're paying Pro pricing.

The API is live for both. More on pricing below.

Steerable Thinking

This is the standout UX innovation.

Previous thinking models gave you a plan upfront and then executed it. If the plan was wrong, you waited for it to finish and then corrected. Wasted tokens, wasted time.

GPT 5.4 Thinking shows you the plan as it forms and lets you steer it. Mid-response. You see the model's reasoning unfold and can inject corrections before it commits to a bad path.

Steerable thinking UI showing mid-response intervention

This matters for complex tasks where the model's first interpretation of your prompt isn't what you meant. Instead of regenerating from scratch, you nudge. It's closer to pair programming than prompt engineering.

Context and Efficiency

A million tokens of context, same as Opus 4.6. But OpenAI added a pricing twist: anything beyond 272k tokens costs 2x. So you can use the full million, but you'll pay for it.

For most workflows, 272k is plenty. If you're feeding entire codebases or long document chains, budget accordingly.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Claude Code: Remote Control, Auto Memory, Plugins & More

Feb 28, 2026 • 5 min read

Mercury 2: The LLM That Doesn't Generate Like an LLM

Feb 24, 2026 • 8 min read

Claude Code Worktrees: Parallel Development Without the Chaos

Feb 21, 2026 • 6 min read

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Feb 19, 2026 • 6 min read

Benchmarks

The headline number is OSWorld Verified--a benchmark for computer use tasks. GPT 5.4 hits 75%. Humans score 72.4%. That's not a typo. The model outperforms average human operators on structured computer tasks.

Benchmark	GPT 5.4	GPT 5.3	Claude Opus 4.6	Humans
OSWorld Verified	75.0%	58.3%	62.1%	72.4%
BrowseComp	71.2%	49.7%	53.8%	--
WebArena	68.4%	51.2%	55.6%	--
Agentic Coding (SWE-bench)	74.1%	69.2%	72.8%	--

BrowseComp and WebArena show meaningful jumps too. These are real-world browser automation tasks--navigating sites, filling forms, extracting data. If you're building agents that interact with the web, these numbers translate directly.

Benchmark comparison chart across computer use and coding tasks

Knowledge Work

OpenAI is leaning into "knowledge work" as a category. Think polished documents, presentations, structured reports. The outputs are noticeably more formatted and complete than 5.3. Fewer rough edges. Better structure.

This is less relevant for developers and more relevant if you're using the API to generate client-facing content. But it signals where OpenAI sees the commercial opportunity: enterprise users who need production-ready documents, not raw text.

Browser Agent Workflows

The computer use capabilities are where GPT 5.4 pulls ahead of the field. OSWorld Verified at 75% isn't just a benchmark win--it means the model can reliably execute multi-step browser workflows.

Navigate to a site. Find the right form. Fill it out. Submit. Verify the result. GPT 5.4 does this with higher reliability than any other model right now, including Opus 4.6.

If you're building browser automation agents, this is the model to test against.

Coding and Frontend Wins

The coding demos are strong. Web games, 3D simulations, complex frontend layouts--all generated with fewer iterations than 5.3. The Cursor team gave positive feedback on integration quality, which matters more than synthetic benchmarks for day-to-day coding workflows.

Where it really shines is frontend. HTML/CSS/JS generation is tighter. Fewer layout bugs. Better responsive handling. If you're using an AI coding assistant for UI work, GPT 5.4 is worth switching to.

API Pricing

Standard pricing for the API:

GPT 5.4:
  Input:  $2.50 / 1M tokens
  Output: $10.00 / 1M tokens

GPT 5.4 Thinking:
  Input:  $5.00 / 1M tokens
  Output: $20.00 / 1M tokens

Context beyond 272k tokens: 2x multiplier on both input and output

Compared to Opus 4.6 ($5 input / $25 output), GPT 5.4 is cheaper across the board. The non-thinking variant is half the cost of Opus on input. If your workload doesn't need extended reasoning, that's significant savings at scale.

Versus Claude Opus 4.6

The honest comparison: they're different tools for different jobs.

Opus 4.6 wins on: agentic terminal coding, long-horizon multi-step tasks, agent team coordination, agentic search. If you're running Claude Code with agent teams on complex codebases, Opus is still the frontier.

GPT 5.4 wins on: computer use, browser automation, frontend code generation, knowledge work output quality, and price-per-token. If you're building web agents or need polished document generation, GPT 5.4 is the better choice.

Neither model dominates everything. Pick based on your workload.

Codex Fast Mode

OpenAI also shipped a fast mode for Codex that runs 1.5x faster than the standard mode. If you're using Codex for batch code generation or CI pipelines, the speed improvement compounds.

This is a quiet but important update. Faster inference means tighter feedback loops. Tighter feedback loops mean more iterations per hour.

Practical Next Steps

Test browser automation workflows. If you have agents that navigate websites, GPT 5.4's computer use scores are best-in-class. Run your existing test suite against it.
Try steerable thinking on complex prompts. The mid-response intervention UX is genuinely new. It changes how you interact with reasoning models.
Compare costs. If you're running high-volume API calls with Opus, price out the same workload on GPT 5.4. The savings might justify a switch for certain tasks.
Watch the 272k boundary. That 2x pricing cliff is easy to hit if you're feeding large codebases. Monitor your token usage.

GPT-5: OpenAI's Most Capable Model

GPT-5 Codex: OpenAI's Agentic Coding Model

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

Access Tiers

Steerable Thinking

Context and Efficiency

Claude Code: Remote Control, Auto Memory, Plugins & More

Mercury 2: The LLM That Doesn't Generate Like an LLM

Claude Code Worktrees: Parallel Development Without the Chaos

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Benchmarks

Knowledge Work

Browser Agent Workflows

Coding and Frontend Wins

API Pricing

Versus Claude Opus 4.6

Codex Fast Mode

Practical Next Steps

Further Reading

Watch the Video

Comments

Related Tools

ChatGPT

OpenAI Codex

GPT-5

Codex CLI

Apps from Developers Digest

Agent Benchmark Lab

DD Canvas

Overnight Agents

Related Guides

MCP Servers Explained

Claude Code Setup Guide

Run AI Models Locally with Ollama and LM Studio

Related Videos

GPT‑5.5 in 7 Minutes

Self Improving Agents in 5 Minutes

Related Posts

GPT-5: OpenAI's Most Capable Model

GPT-5 Codex: OpenAI's Agentic Coding Model

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

Codex Changelog April 2026: Goals, Browser Use, GPT-5.5, and Safer Agents

OpenAI Dev Day 2025: Everything Announced

What Are Claude Code Skills? A Complete Beginner Guide

Get Smarter About AI Dev

GPT-5: OpenAI's Most Capable Model

GPT-5 Codex: OpenAI's Agentic Coding Model

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

Access Tiers

Steerable Thinking

Context and Efficiency

Claude Code: Remote Control, Auto Memory, Plugins & More

Mercury 2: The LLM That Doesn't Generate Like an LLM

Claude Code Worktrees: Parallel Development Without the Chaos

Claude Sonnet 4.6: Approaching Opus at Half the Cost

Benchmarks

Knowledge Work

Browser Agent Workflows

Coding and Frontend Wins

API Pricing

Versus Claude Opus 4.6

Codex Fast Mode

Practical Next Steps

Further Reading

Watch the Video

Comments

Related Tools

ChatGPT

OpenAI Codex

GPT-5

Codex CLI

Apps from Developers Digest

Agent Benchmark Lab

DD Canvas

Overnight Agents

Related Guides

MCP Servers Explained

Claude Code Setup Guide

Run AI Models Locally with Ollama and LM Studio

Related Videos