TL;DR
State-of-the-art computer use, steerable thinking you can redirect mid-response, and a million tokens of context. GPT 5.4 is OpenAI's most capable model yet.
Read next
GPT-5 introduces a fundamentally different approach to inference. Instead of forcing developers to manually configure reasoning parameters, the model operates as a unified system with real-time rou...
7 min readOpenAI is drawing a line in the sand. GPT-5 Codex is not an API release.
7 min readA developer's comparison of OpenAI and Anthropic ecosystems - models, coding tools, APIs, pricing, and which to choose for different use cases.
10 min readOpenAI shipped GPT 5.4 and it matters. Not because it tops every benchmark--it doesn't--but because it changes what you can actually do with a model in production.
Two variants landed: GPT 5.4 Thinking and GPT 5.4. The first is the reasoning powerhouse. The second is the fast, capable default. Both have a million tokens of context and a new steerable thinking UX that lets you redirect the model's reasoning mid-response. That last part is new for everyone.
Let's break it down.
This is where OpenAI's pricing maze gets real.
For model-selection context, compare this with OpenAI Codex: Cloud AI Coding With GPT-5.3 and OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.
GPT 5.4 Thinking is available on ChatGPT Plus ($20/mo), Teams, Pro, and Enterprise. That's the reasoning model most people will use.
GPT 5.4 (the non-thinking variant) is locked to the $200/month Pro tier. If you want both, you're paying Pro pricing.
The API is live for both. More on pricing below.
This is the standout UX innovation.
Previous thinking models gave you a plan upfront and then executed it. If the plan was wrong, you waited for it to finish and then corrected. Wasted tokens, wasted time.
GPT 5.4 Thinking shows you the plan as it forms and lets you steer it. Mid-response. You see the model's reasoning unfold and can inject corrections before it commits to a bad path.

This matters for complex tasks where the model's first interpretation of your prompt isn't what you meant. Instead of regenerating from scratch, you nudge. It's closer to pair programming than prompt engineering.
A million tokens of context, same as Opus 4.6. But OpenAI added a pricing twist: anything beyond 272k tokens costs 2x. So you can use the full million, but you'll pay for it.
For most workflows, 272k is plenty. If you're feeding entire codebases or long document chains, budget accordingly.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Feb 28, 2026 • 5 min read
Feb 24, 2026 • 8 min read
Feb 21, 2026 • 6 min read
Feb 19, 2026 • 6 min read
The headline number is OSWorld Verified--a benchmark for computer use tasks. GPT 5.4 hits 75%. Humans score 72.4%. That's not a typo. The model outperforms average human operators on structured computer tasks.
| Benchmark | GPT 5.4 | GPT 5.3 | Claude Opus 4.6 | Humans |
|---|---|---|---|---|
| OSWorld Verified | 75.0% | 58.3% | 62.1% | 72.4% |
| BrowseComp | 71.2% | 49.7% | 53.8% | -- |
| WebArena | 68.4% | 51.2% | 55.6% | -- |
| Agentic Coding (SWE-bench) | 74.1% | 69.2% | 72.8% | -- |
BrowseComp and WebArena show meaningful jumps too. These are real-world browser automation tasks--navigating sites, filling forms, extracting data. If you're building agents that interact with the web, these numbers translate directly.

OpenAI is leaning into "knowledge work" as a category. Think polished documents, presentations, structured reports. The outputs are noticeably more formatted and complete than 5.3. Fewer rough edges. Better structure.
This is less relevant for developers and more relevant if you're using the API to generate client-facing content. But it signals where OpenAI sees the commercial opportunity: enterprise users who need production-ready documents, not raw text.
The computer use capabilities are where GPT 5.4 pulls ahead of the field. OSWorld Verified at 75% isn't just a benchmark win--it means the model can reliably execute multi-step browser workflows.
Navigate to a site. Find the right form. Fill it out. Submit. Verify the result. GPT 5.4 does this with higher reliability than any other model right now, including Opus 4.6.
If you're building browser automation agents, this is the model to test against.
The coding demos are strong. Web games, 3D simulations, complex frontend layouts--all generated with fewer iterations than 5.3. The Cursor team gave positive feedback on integration quality, which matters more than synthetic benchmarks for day-to-day coding workflows.
Where it really shines is frontend. HTML/CSS/JS generation is tighter. Fewer layout bugs. Better responsive handling. If you're using an AI coding assistant for UI work, GPT 5.4 is worth switching to.
Standard pricing for the API:
GPT 5.4:
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
GPT 5.4 Thinking:
Input: $5.00 / 1M tokens
Output: $20.00 / 1M tokens
Context beyond 272k tokens: 2x multiplier on both input and output
Compared to Opus 4.6 ($5 input / $25 output), GPT 5.4 is cheaper across the board. The non-thinking variant is half the cost of Opus on input. If your workload doesn't need extended reasoning, that's significant savings at scale.
The honest comparison: they're different tools for different jobs.
Opus 4.6 wins on: agentic terminal coding, long-horizon multi-step tasks, agent team coordination, agentic search. If you're running Claude Code with agent teams on complex codebases, Opus is still the frontier.
GPT 5.4 wins on: computer use, browser automation, frontend code generation, knowledge work output quality, and price-per-token. If you're building web agents or need polished document generation, GPT 5.4 is the better choice.
Neither model dominates everything. Pick based on your workload.
OpenAI also shipped a fast mode for Codex that runs 1.5x faster than the standard mode. If you're using Codex for batch code generation or CI pipelines, the speed improvement compounds.
This is a quiet but important update. Faster inference means tighter feedback loops. Tighter feedback loops mean more iterations per hour.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View ToolOpenAI's coding agent for terminal, cloud, IDE, GitHub, Slack, and Linear workflows. Reads repos, edits files, runs comm...
View ToolOpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT P...
View ToolOpenAI's open-source terminal coding agent built in Rust. Runs locally, reads your repo, edits files, and executes comma...
View ToolCompare AI coding agents on reproducible tasks with scored, shareable runs.
View AppAI app generator. Describe what you want and get a working app in minutes.
View AppSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting Started
GPT-5.5 Is Here: Benchmarks, Codex Agents, Context Window & Pricing Explained The video reviews OpenAI’s newly released GPT-5.5, now rolling out to ChatGPT and Codex, positioned as a “new class of in...

Auto Agent: Self-Improving AI Harnesses Inspired by Karpathy’s Auto-Research Loop The video explains self-improving agents and highlights Kevin Guo’s Auto Agent project as an extension of Andrej Karp...

GPT-5 introduces a fundamentally different approach to inference. Instead of forcing developers to manually configure re...

OpenAI is drawing a line in the sand. GPT-5 Codex is not an API release.

A developer's comparison of OpenAI and Anthropic ecosystems - models, coding tools, APIs, pricing, and which to choose f...

OpenAI's April 2026 Codex changelog shows a clear product shift: Codex is becoming a full agent workspace with goals, br...

OpenAI is turning ChatGPT into a hub. The new Apps feature lets you access external services directly inside conversatio...

Skills are how you stop copy-pasting the same workflow into Claude Code every session. What they are, how to write one,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.