GPT-5: OpenAI's Most Capable Model

Update (March 2026): OpenAI has since released GPT-5.3 and GPT-5.4 with significant improvements. This article covers the original GPT-5 launch.

A Unified Architecture That Thinks Before It Acts

GPT-5 introduces a fundamentally different approach to inference. Instead of forcing developers to manually configure reasoning parameters, the model operates as a unified system with real-time routing based on query complexity.

For model-selection context, compare this with OpenAI Codex: Cloud AI Coding With GPT-5.3 and OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; the useful question is not only benchmark quality, but where the model fits in a real developer workflow.

Tell it to "think hard" about a difficult problem, and it allocates additional compute. Ask a simple conversational question, and it responds immediately without burning tokens on unnecessary test-time compute. This dynamic routing eliminates the guesswork of selecting between fixed reasoning modes while keeping costs predictable.

Real-World Performance Beyond Benchmarks

OpenAI optimized GPT-5 for practical utility, not just leaderboard scores. The focus areas, writing, coding, and health, represent ChatGPT's most common use cases.

Hallucination rates are down. Instruction following is tighter. But the real difference shows up in qualitative output.

Front-End Coding Leap

The model demonstrates measurable improvements in front-end development. During demonstrations, GPT-5 generated complete interactive applications: a physics-based ball-rolling game, a pixel art canvas, a typing trainer, a drum simulator, and a lofi music environment. One standout example was a 3JS-style castle defense game with interactive balloon targeting, built entirely from a text prompt within Cursor.

Health Queries That Actually Feel Human

When asked about cancer risk factors, previous models like O3 responded with dry tables and bullet-point citations. GPT-5 leads with empathy: "I'm sorry you're dealing with this worry. Many people have the same question." The information is equally accurate, but the delivery respects the emotional weight of the query.

Health response comparison showing empathetic vs clinical outputs

Benchmark Analysis: Intelligence Per Token

Artificial Analysis' aggregate Intelligence Index, combining MMLU, GPQA Diamond, Humanity's Last Exam, and Live CodeBench, places GPT-5 (high mode) at state-of-the-art. Even GPT-5 medium outperforms the best competing models.

The efficiency curve is where it gets interesting. GPT-5 low ranks above Claude 4 Sonnet Thinking and approaches Qwen 3 235B, while using significantly fewer tokens. When plotting intelligence against output tokens consumed, GPT-5 dominates the curve, delivering superior results at lower cost and latency than Grok 4.

Benchmark comparison showing intelligence index vs token efficiency

Where It Wins and Where It Trails

GPT-5 takes best-in-class status on MMLU Pro, Humanity's Last Exam, AMIE medical evaluations, long-context tasks, and instruction following. GPQA Diamond still belongs to Grok 4. On Live CodeBench, it trails O4 mini (high) and Grok.

LM Arena human preference data shows GPT-5 beating Gemini 2.5 Pro on text responses and dominating WebDev Arena against Gemini 2.5 Pro, DeepSeek R1, and Claude 4 Opus.

ARC-AGI scores put GPT-5 high at 65.7 versus Grok 4's 66.7, but GPT-5 achieves this at roughly half the cost per task.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Open Lovable: Re-Imagine Websites in Seconds

Aug 8, 2025 • 5 min read

GPT-OSS: OpenAI's First Open Source Model

Aug 6, 2025 • 6 min read

Augment's Task List: AI-Powered Development Planning

Aug 5, 2025 • 6 min read

Claude Code Sub Agents: Parallel AI Development

Jul 25, 2025 • 6 min read

The API: Four Models, One Architecture

The GPT-5 family launches with four variants:

Model	Input	Output	Use Case
GPT-5	$1.25/M	$10/M	Flagship performance
GPT-5 Mini	$0.25/M	$2/M	Balanced speed and capability
GPT-5 Nano	Lower cost	Lower cost	Latency-sensitive applications
GPT-5 Chat	Optimized	Optimized	Conversational interfaces

All four support multimodal inputs (text and image), function calling, structured outputs, and streaming. The flagship model adds predicted outputs for efficient code refactoring and text editing workflows.

Context window is 400,000 tokens across the board, with 128,000 max output tokens. Pricing undercuts Grok 4 and Claude 4 Sonnet Thinking ($3/$15 per million) while matching Gemini 2.5 Pro's rates with superior performance.

Developer Validation

Cognition's Junior Dev Eval, the benchmark behind the Devin coding agent, shows GPT-5 outperforming Sonnet and GPT-4.1 on exploration, planning, and code execution.

The Cursor CEO publicly called it the best coding model they've used to date. During OpenAI's livestream, the model resolved a GitHub issue in real-time. Both Windsurf and Cursor are offering GPT-5 access to users immediately.

Coding workflow demonstration in IDE environment

Availability

GPT-5 is rolling out to all ChatGPT users today. Plus subscribers receive expanded usage limits. Pro subscribers unlock GPT-5 Pro, the equivalent of API high mode, for extended reasoning on complex problems.

Frequently Asked Questions

Is GPT-5 better than Claude?

GPT-5 and Claude 4 (Opus, Sonnet) represent different design philosophies. GPT-5 leads on coding benchmarks, front-end development, and multimodal tasks. Claude 4 Opus excels at long-form writing, nuanced reasoning, and tasks requiring extended context. For pure coding performance in tools like Cursor, GPT-5 edges ahead. For agentic workflows with complex instructions, Claude often follows directions more reliably.

How much does GPT-5 cost?

GPT-5 flagship costs $1.25 per million input tokens and $10 per million output tokens. GPT-5 Mini runs at $0.25/$2 per million. This undercuts Grok 4 and Claude 4 Sonnet Thinking ($3/$15) while delivering competitive or superior performance. ChatGPT Plus subscribers get GPT-5 access included; Pro subscribers unlock GPT-5 Pro with extended reasoning.

What is GPT-5's context window?

GPT-5 supports a 400,000 token context window with up to 128,000 max output tokens. This matches the largest context windows available in 2026 and supports complex codebases, long documents, and multi-file analysis without chunking.

Is GPT-5 available in the API?

Yes. GPT-5, GPT-5 Mini, GPT-5 Nano, and GPT-5 Chat are all available via the OpenAI API. All variants support multimodal inputs (text and image), function calling, structured outputs, and streaming. The flagship model adds predicted outputs for efficient code refactoring.

Can I use GPT-5 in Cursor?

Yes. Cursor integrated GPT-5 on launch day. The Cursor CEO called it "the best coding model they've used to date." GPT-5 is available as a model option in Cursor settings, and Windsurf also offers GPT-5 access.

What happened to GPT-4.5?

OpenAI skipped the GPT-4.5 naming. The progression went from GPT-4 Turbo and GPT-4o to GPT-5, reflecting the significant architectural changes rather than an incremental update. The unified inference architecture with dynamic reasoning routing represented a larger leap than typical point releases.

How does GPT-5 compare to Gemini 2.5 Pro?

GPT-5 matches Gemini 2.5 Pro's pricing ($1.25/$10 per million tokens for flagship) while outperforming it on most benchmarks. LM Arena human preference data shows GPT-5 beating Gemini 2.5 Pro on both text responses and WebDev tasks. Gemini retains advantages in certain multimodal scenarios and Google ecosystem integration.

What is the difference between GPT-5 and GPT-5 Pro?

GPT-5 Pro is the extended reasoning mode available to ChatGPT Pro subscribers. It allocates additional compute for complex problems, equivalent to the API's "high" reasoning mode. Standard GPT-5 dynamically routes between reasoning modes based on query complexity, while GPT-5 Pro forces maximum reasoning allocation.

OpenAI's GPT 5.4 in 10 Minutes

GPT-5 Codex: OpenAI's Agentic Coding Model

OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience

A Unified Architecture That Thinks Before It Acts