TL;DR
GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier models that carry most production traffic.
Direct answer
GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier models that carry most production traffic.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per million tokens, plus the three caveats that change the math.
10 min readClaude Fable 5 vs Gemini: how Anthropic's $10/$50 Mythos-class model compares to Gemini 3.1 Pro's $2/$12 preview on pricing, context, and benchmarks.
8 min readGPT-5.5 vs Claude Opus 4.8: both cost $5 per million input tokens, so the workhorse-tier decision comes down to output pricing, benchmarks, and tooling.
8 min readLast updated: June 11, 2026
Flagship comparisons get the headlines. Fable 5 at $10/$50, GPT-5.5-pro at $30/$180 - those are the models people argue about on launch day. But most production token spend runs a tier below: the $2-3 per million input token class that handles summarization queues, agent steps, RAG answers, and the long tail of app traffic that never needed a frontier model.
That tier has three serious contenders right now: OpenAI's GPT-5.4, Google's Gemini 3.1 Pro Preview, and DeepSeek's V4 Pro. They land within a few points of each other on the benchmarks that matter, so the decision comes down to pricing mechanics, context behavior, and license terms - the boring details that compound into real money at volume. Every price below is verified against the live vendor pages.
| GPT-5.4 | Gemini 3.1 Pro Preview | DeepSeek V4 Pro | |
|---|---|---|---|
| Vendor | OpenAI | DeepSeek | |
| Released | Mar 5, 2026 (GA) | Feb 19, 2026 (preview) | Apr 24, 2026 |
| Input ($/MTok) | $2.50 | $2.00 (<=200K) / $4.00 (>200K) | $0.435 (cache miss) |
| Output ($/MTok) | $15.00 | $12.00 (<=200K) / $18.00 (>200K) | $0.87 |
| Cached input ($/MTok) | $0.25 | $0.20 + $4.50/MTok/hr storage | $0.003625 |
| Context window | 272K standard, 1M via API | 1M | 1M |
| Max output tokens | 128K | 64K | 384K |
| SWE-bench Pro | 57.7% | 54.2% (Public) | 55.4% |
| License | Proprietary | Proprietary | MIT open weights |
All prices verified June 11, 2026 against the OpenAI pricing page, the Gemini API pricing page, and the DeepSeek pricing page. Benchmark figures come from third-party writeups (sources at the end) because vendor benchmark reporting is inconsistent across variants and scaffolds - treat the coding scores as a cluster, not a ranking.
The headline framing of "the $2-3 class" undersells how wide the spread actually is once you do the math.
GPT-5.4 runs $2.50 input / $15.00 output, with cached input at $0.25 and a 50% batch discount (verified June 11, 2026, developers.openai.com/api/docs/pricing). Gemini 3.1 Pro Preview is $2.00 / $12.00 for prompts up to 200K tokens, jumping to $4.00 / $18.00 above that, with batch at half price (verified June 11, 2026, ai.google.dev/gemini-api/docs/pricing).
Then there is DeepSeek V4 Pro: $0.435 input on a cache miss, $0.87 output, and a cache-hit input price of $0.003625 per million tokens (verified June 11, 2026, api-docs.deepseek.com/quick_start/pricing). That output price is roughly 17x cheaper than GPT-5.4 and about 14x cheaper than Gemini 3.1 Pro under the 200K threshold.
One thing worth flagging: April launch coverage, including Artificial Analysis and DataCamp, cited V4 Pro at $1.74 / $3.48. The official page as verified June 11, 2026 lists $0.435 / $0.87. The live page is the source of truth - and it says V4 Pro costs about a quarter of what most comparison articles still quote.
Concrete example: a workload pushing 10M input and 2M output tokens per month, no caching, all prompts under 200K:
GPT-5.4 and Gemini are within 25% of each other. DeepSeek is playing a different sport. If raw unit cost is your primary axis, this comparison is over before the benchmarks section - which is exactly why the benchmarks section matters.
All three discount repeated input, but the mechanics diverge. OpenAI's cached input is a flat $0.25 (0.1x base) with no storage fee. Gemini charges $0.20 per million for cache reads under 200K context plus $4.50 per million tokens per hour of cache storage, so idle caches cost money on spiky traffic. DeepSeek's cache hits are $0.003625 - close enough to free that input-side prompt architecture barely matters for cost. If your agent reuses a large system prompt thousands of times a day, these mechanics can outweigh the base rates. For the routing-layer view, see our LLM router comparison.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
On agentic coding, the three are clustered. The nxcode GPT-5.4 guide puts GPT-5.4 at 57.7% on SWE-bench Pro and roughly 80% on SWE-bench Verified. The nxcode Gemini 3.1 Pro guide lists Gemini 3.1 Pro at 54.2% on SWE-bench Pro (Public) and 80.6% on SWE-bench Verified. DataCamp's DeepSeek V4 analysis has V4 Pro at 55.4% on SWE-bench Pro and 67.9% on Terminal-Bench 2.0, versus Gemini's 68.5% on the same benchmark.
A 3.5-point spread reported by different evaluators with different scaffolds is not a ranking - it says all three do real agentic coding at a similar hit rate. The differentiation shows up off the center line:
One caveat from the Artificial Analysis data: V4 Pro burned 190M output tokens completing the Intelligence Index. DeepSeek thinks in volume, so part of the per-token advantage gets spent on extra tokens - V4 Pro still wins on total cost, but do not expect the full 17x in practice.
All three advertise 1M-token context, but the fine print differs in ways that bite.
GPT-5.4 gives you 272K standard, 1M via API - and input doubles to $5.00 per million above the 272K threshold per the nxcode guide; OpenAI's pricing page splits the model into short-context and long-context rows. Gemini 3.1 Pro tiers earlier: above 200K prompt tokens, input goes to $4.00 and output to $18.00 (verified June 11, 2026 on the Gemini pricing page). DeepSeek V4 Pro lists no long-context surcharge at all, plus the largest max output of the three at 384K tokens (verified June 11, 2026).
The asymmetry to remember: Gemini has the smallest max output (64K) and the earliest price cliff (200K), GPT-5.4 has the smallest standard context (272K), and DeepSeek is the only one where a 900K-token prompt costs the same per token as a 9K one. DataCamp reports V4 Pro at 83.5% on the MRCR 1M needle test, so the long context is not just nominal.
One operational note from DeepSeek's pricing page: V4 Pro is capped at 500 concurrent requests (V4 Flash gets 2,500). For high-fanout workloads, that ceiling is a real constraint the closed vendors do not impose at this tier.
This is the dimension where the three models genuinely fork.
DeepSeek V4 is MIT-licensed with downloadable weights - 1.6T total parameters, 49B active, an 865GB download, per DataCamp. You can self-host it, fine-tune it, and ship it in commercial products with no usage-based terms. The practical bar is high, but for teams with data-locality requirements it is the only option in this tier that fully removes the API dependency. We covered the economics in our DeepSeek V4 developer guide.
GPT-5.4 is proprietary but GA, with the most mature surrounding tooling, and OpenAI offers regional data-residency processing at a 10% uplift for models released on or after March 5, 2026 (verified June 11, 2026 on the pricing page). For setup details see our GPT-5.4 developer guide.
Gemini 3.1 Pro is the odd one out on status: nearly four months after its February 19 release, the model string is still gemini-3.1-pro-preview (verified June 11, 2026 on the pricing page), and Google's announcement promises GA "soon" without a date. Preview means Google reserves the right to change behavior and endpoints. Plenty of teams run preview models in production anyway, but if your change management cares about stability guarantees, the label matters. To evaluate it cheaply from the terminal first, our Gemini CLI guide covers the free-tier path.
deepseek-v4-pro with no preview label (the DeepSeek pricing page is even deprecating older model names in its favor).Skip up if your workload is long-horizon agentic work where completion rate dominates cost. A mid-tier model that fails a complex migration three times costs more than a frontier model that finishes once - the same math we walked through in Fable 5 vs DeepSeek V4 on cost versus quality. The 55-58% SWE-bench Pro cluster here converts directly into retries on genuinely hard autonomous tasks.
Skip down if your workload is classification, extraction, or short-form generation at scale. GPT-5.4-nano is $0.20 / $1.25 and DeepSeek V4 Flash is $0.14 / $0.28 (both verified June 11, 2026 on their pricing pages). For tasks that do not need mid-tier reasoning, even V4 Pro is overpaying.
Stay in this tier for everything in between, which for most products is the majority of tokens. The honest summary: GPT-5.4 buys polish and GA stability, Gemini 3.1 Pro buys reasoning headroom and the best closed-vendor price under 200K context, and DeepSeek V4 Pro buys an order of magnitude on cost in exchange for a 500-request concurrency cap, verbose reasoning-token usage, and a few months of capability lag.
Yes, by a wide margin on list price. Verified June 11, 2026 on the official pricing pages: V4 Pro is $0.435 input / $0.87 output per million tokens versus $2.50 / $15.00 for GPT-5.4 and $2.00 / $12.00 for Gemini 3.1 Pro under 200K context. DeepSeek emits more reasoning tokens per task, so realized savings are smaller than the sticker ratio, but still large.
They are close: GPT-5.4 reports 57.7% on SWE-bench Pro, DeepSeek V4 Pro 55.4%, and Gemini 3.1 Pro 54.2% (Public split), per the third-party sources below. Different evaluators produced these numbers, so treat them as a cluster and pick on price, context behavior, and tooling instead.
Yes. As of June 11, 2026, the Gemini API pricing page lists the model as gemini-3.1-pro-preview, and Google's announcement promises general availability "soon" without a date. It has been in preview since February 19, 2026.
Yes. The weights are MIT-licensed, but V4 Pro is a 1.6T-parameter mixture-of-experts model with an 865GB download, so self-hosting needs serious multi-GPU infrastructure. The smaller V4 Flash (284B total, 160GB) is the more realistic self-host target.
On two of the three. Gemini 3.1 Pro doubles input to $4.00 and raises output to $18.00 above 200K prompt tokens, and GPT-5.4 input doubles to $5.00 above its 272K standard window per third-party documentation. DeepSeek V4 Pro lists flat pricing across its full 1M context (all verified June 11, 2026).
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
DeepSeek's reasoning-first model built for agents. First model to integrate thinking directly into tool use. Ships along...
View ToolGoogle's open-source coding CLI. Free tier with Gemini 2.5 Pro. Supports tool use, file editing, shell commands. 1M toke...
View ToolOpen-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolOpen-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View ToolClaude Fable 5 vs Gemini: how Anthropic's $10/$50 Mythos-class model compares to Gemini 3.1 Pro's $2/$12 preview on pric...
Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per milli...
GPT-5.5 vs Claude Opus 4.8: both cost $5 per million input tokens, so the workhorse-tier decision comes down to output p...

A developer's comparison of OpenAI and Anthropic ecosystems - models, coding tools, APIs, pricing, and which to choose f...
Claude Agent SDK vs Claude Code explained: same engine, two surfaces. Here is the concrete decision line, plus where Man...
Claude Code Routines and Managed Agents scheduled deployments both run Claude on a schedule - here is how the triggers,...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.