The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

Last updated: June 11, 2026

Flagship comparisons get the headlines. Fable 5 at $10/$50, GPT-5.5-pro at $30/$180 - those are the models people argue about on launch day. But most production token spend runs a tier below: the $2-3 per million input token class that handles summarization queues, agent steps, RAG answers, and the long tail of app traffic that never needed a frontier model.

That tier has three serious contenders right now: OpenAI's GPT-5.4, Google's Gemini 3.1 Pro Preview, and DeepSeek's V4 Pro. They land within a few points of each other on the benchmarks that matter, so the decision comes down to pricing mechanics, context behavior, and license terms - the boring details that compound into real money at volume. Every price below is verified against the live vendor pages.

The Contenders at a Glance

	GPT-5.4	Gemini 3.1 Pro Preview	DeepSeek V4 Pro
Vendor	OpenAI	Google	DeepSeek
Released	Mar 5, 2026 (GA)	Feb 19, 2026 (preview)	Apr 24, 2026
Input ($/MTok)	$2.50	$2.00 (<=200K) / $4.00 (>200K)	$0.435 (cache miss)
Output ($/MTok)	$15.00	$12.00 (<=200K) / $18.00 (>200K)	$0.87
Cached input ($/MTok)	$0.25	$0.20 + $4.50/MTok/hr storage	$0.003625
Context window	272K standard, 1M via API	1M	1M
Max output tokens	128K	64K	384K
SWE-bench Pro	57.7%	54.2% (Public)	55.4%
License	Proprietary	Proprietary	MIT open weights

All prices verified June 11, 2026 against the OpenAI pricing page, the Gemini API pricing page, and the DeepSeek pricing page. Benchmark figures come from third-party writeups (sources at the end) because vendor benchmark reporting is inconsistent across variants and scaffolds - treat the coding scores as a cluster, not a ranking.

Pricing: Where the Tier Stops Being a Tier

The headline framing of "the $2-3 class" undersells how wide the spread actually is once you do the math.

GPT-5.4 runs $2.50 input / $15.00 output, with cached input at $0.25 and a 50% batch discount (verified June 11, 2026, developers.openai.com/api/docs/pricing). Gemini 3.1 Pro Preview is $2.00 / $12.00 for prompts up to 200K tokens, jumping to $4.00 / $18.00 above that, with batch at half price (verified June 11, 2026, ai.google.dev/gemini-api/docs/pricing).

Then there is DeepSeek V4 Pro: $0.435 input on a cache miss, $0.87 output, and a cache-hit input price of $0.003625 per million tokens (verified June 11, 2026, api-docs.deepseek.com/quick_start/pricing). That output price is roughly 17x cheaper than GPT-5.4 and about 14x cheaper than Gemini 3.1 Pro under the 200K threshold.

One thing worth flagging: April launch coverage, including Artificial Analysis and DataCamp, cited V4 Pro at $1.74 / $3.48. The official page as verified June 11, 2026 lists $0.435 / $0.87. The live page is the source of truth - and it says V4 Pro costs about a quarter of what most comparison articles still quote.

Concrete example: a workload pushing 10M input and 2M output tokens per month, no caching, all prompts under 200K:

GPT-5.4: $25.00 + $30.00 = $55.00
Gemini 3.1 Pro Preview: $20.00 + $24.00 = $44.00
DeepSeek V4 Pro: $4.35 + $1.74 = $6.09

GPT-5.4 and Gemini are within 25% of each other. DeepSeek is playing a different sport. If raw unit cost is your primary axis, this comparison is over before the benchmarks section - which is exactly why the benchmarks section matters.

Caching Mechanics Differ More Than the Sticker Prices

All three discount repeated input, but the mechanics diverge. OpenAI's cached input is a flat $0.25 (0.1x base) with no storage fee. Gemini charges $0.20 per million for cache reads under 200K context plus $4.50 per million tokens per hour of cache storage, so idle caches cost money on spiky traffic. DeepSeek's cache hits are $0.003625 - close enough to free that input-side prompt architecture barely matters for cost. If your agent reuses a large system prompt thousands of times a day, these mechanics can outweigh the base rates. For the routing-layer view, see our LLM router comparison.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

How to Use Claude Fable 5: Every Access Path Explained

Jun 11, 2026 • 8 min read

Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Jun 11, 2026 • 8 min read

Managing a Fleet of Claude Agents: A Practical Guide

Jun 11, 2026 • 10 min read

Migrating Off Retired GPT Models in 2026: A Working Checklist

Jun 11, 2026 • 10 min read

Benchmarks: Close Enough That Price Is the Tiebreaker

On agentic coding, the three are clustered. The nxcode GPT-5.4 guide puts GPT-5.4 at 57.7% on SWE-bench Pro and roughly 80% on SWE-bench Verified. The nxcode Gemini 3.1 Pro guide lists Gemini 3.1 Pro at 54.2% on SWE-bench Pro (Public) and 80.6% on SWE-bench Verified. DataCamp's DeepSeek V4 analysis has V4 Pro at 55.4% on SWE-bench Pro and 67.9% on Terminal-Bench 2.0, versus Gemini's 68.5% on the same benchmark.

A 3.5-point spread reported by different evaluators with different scaffolds is not a ranking - it says all three do real agentic coding at a similar hit rate. The differentiation shows up off the center line:

GPT-5.4 reports 75% on OSWorld, which nxcode notes is above the 72.4% human expert baseline for desktop automation. If your workload involves computer use, GPT-5.4 has the strongest claim in this tier.
Gemini 3.1 Pro posted a verified 77.1% on ARC-AGI-2, more than double Gemini 3 Pro, per Google's announcement, plus 94.3% on GPQA Diamond per nxcode. For novel-reasoning and science-heavy workloads, Gemini has the edge.
DeepSeek V4 Pro scores an Artificial Analysis Intelligence Index of 52, the #2 open-weights model, with a GDPval-AA of 1554 leading the open-weights field, per Artificial Analysis. DataCamp's framing is the honest one: it trails closed state of the art by 3 to 6 months, at a fraction of the price.

One caveat from the Artificial Analysis data: V4 Pro burned 190M output tokens completing the Intelligence Index. DeepSeek thinks in volume, so part of the per-token advantage gets spent on extra tokens - V4 Pro still wins on total cost, but do not expect the full 17x in practice.

Context Windows and Long-Context Behavior

All three advertise 1M-token context, but the fine print differs in ways that bite.

GPT-5.4 gives you 272K standard, 1M via API - and input doubles to $5.00 per million above the 272K threshold per the nxcode guide; OpenAI's pricing page splits the model into short-context and long-context rows. Gemini 3.1 Pro tiers earlier: above 200K prompt tokens, input goes to $4.00 and output to $18.00 (verified June 11, 2026 on the Gemini pricing page). DeepSeek V4 Pro lists no long-context surcharge at all, plus the largest max output of the three at 384K tokens (verified June 11, 2026).

The asymmetry to remember: Gemini has the smallest max output (64K) and the earliest price cliff (200K), GPT-5.4 has the smallest standard context (272K), and DeepSeek is the only one where a 900K-token prompt costs the same per token as a 9K one. DataCamp reports V4 Pro at 83.5% on the MRCR 1M needle test, so the long context is not just nominal.

One operational note from DeepSeek's pricing page: V4 Pro is capped at 500 concurrent requests (V4 Flash gets 2,500). For high-fanout workloads, that ceiling is a real constraint the closed vendors do not impose at this tier.

License and Deployment Terms

This is the dimension where the three models genuinely fork.

DeepSeek V4 is MIT-licensed with downloadable weights - 1.6T total parameters, 49B active, an 865GB download, per DataCamp. You can self-host it, fine-tune it, and ship it in commercial products with no usage-based terms. The practical bar is high, but for teams with data-locality requirements it is the only option in this tier that fully removes the API dependency. We covered the economics in our DeepSeek V4 developer guide.

GPT-5.4 is proprietary but GA, with the most mature surrounding tooling, and OpenAI offers regional data-residency processing at a 10% uplift for models released on or after March 5, 2026 (verified June 11, 2026 on the pricing page). For setup details see our GPT-5.4 developer guide.

Gemini 3.1 Pro is the odd one out on status: nearly four months after its February 19 release, the model string is still gemini-3.1-pro-preview (verified June 11, 2026 on the pricing page), and Google's announcement promises GA "soon" without a date. Preview means Google reserves the right to change behavior and endpoints. Plenty of teams run preview models in production anyway, but if your change management cares about stability guarantees, the label matters. To evaluate it cheaply from the terminal first, our Gemini CLI guide covers the free-tier path.

Decision Guide by Persona

Cost-driven, high-volume production traffic. DeepSeek V4 Pro, and it is not close. At $0.435 / $0.87 with near-free cache hits, you can run roughly 9x the volume of GPT-5.4 on the same budget. Budget for its verbose reasoning style and the 500-request concurrency cap. Our budget coding agents writeup has real-world patterns.
Computer-use and desktop automation agents. GPT-5.4. The 75% OSWorld figure is the standout benchmark in this tier, and GA status reduces integration risk.
Long-document analysis under 200K tokens. Gemini 3.1 Pro Preview. At $2.00 / $12.00 it is the cheapest closed-vendor option, and its reasoning scores lead the tier. Architect around the 200K price cliff and 64K output ceiling.
Compliance, data locality, or vendor independence. DeepSeek V4 Pro via self-hosted MIT weights, or GPT-5.4 with regional processing if self-hosting is off the table.
Preview-averse enterprises. GPT-5.4 or DeepSeek V4 Pro. Gemini 3.1 Pro is the only one of the three still labeled preview as of June 11, 2026 - GPT-5.4 is GA, and DeepSeek's API docs list deepseek-v4-pro with no preview label (the DeepSeek pricing page is even deprecating older model names in its favor).

When to Skip This Tier

Skip up if your workload is long-horizon agentic work where completion rate dominates cost. A mid-tier model that fails a complex migration three times costs more than a frontier model that finishes once - the same math we walked through in Fable 5 vs DeepSeek V4 on cost versus quality. The 55-58% SWE-bench Pro cluster here converts directly into retries on genuinely hard autonomous tasks.

Skip down if your workload is classification, extraction, or short-form generation at scale. GPT-5.4-nano is $0.20 / $1.25 and DeepSeek V4 Flash is $0.14 / $0.28 (both verified June 11, 2026 on their pricing pages). For tasks that do not need mid-tier reasoning, even V4 Pro is overpaying.

Stay in this tier for everything in between, which for most products is the majority of tokens. The honest summary: GPT-5.4 buys polish and GA stability, Gemini 3.1 Pro buys reasoning headroom and the best closed-vendor price under 200K context, and DeepSeek V4 Pro buys an order of magnitude on cost in exchange for a 500-request concurrency cap, verbose reasoning-token usage, and a few months of capability lag.

FAQ

Is DeepSeek V4 Pro really cheaper than GPT-5.4 and Gemini 3.1 Pro?

Yes, by a wide margin on list price. Verified June 11, 2026 on the official pricing pages: V4 Pro is $0.435 input / $0.87 output per million tokens versus $2.50 / $15.00 for GPT-5.4 and $2.00 / $12.00 for Gemini 3.1 Pro under 200K context. DeepSeek emits more reasoning tokens per task, so realized savings are smaller than the sticker ratio, but still large.

Which model is best for coding in the mid-tier class?

They are close: GPT-5.4 reports 57.7% on SWE-bench Pro, DeepSeek V4 Pro 55.4%, and Gemini 3.1 Pro 54.2% (Public split), per the third-party sources below. Different evaluators produced these numbers, so treat them as a cluster and pick on price, context behavior, and tooling instead.

Is Gemini 3.1 Pro still in preview?

Yes. As of June 11, 2026, the Gemini API pricing page lists the model as gemini-3.1-pro-preview, and Google's announcement promises general availability "soon" without a date. It has been in preview since February 19, 2026.

Can I self-host DeepSeek V4 Pro?

Yes. The weights are MIT-licensed, but V4 Pro is a 1.6T-parameter mixture-of-experts model with an 865GB download, so self-hosting needs serious multi-GPU infrastructure. The smaller V4 Flash (284B total, 160GB) is the more realistic self-host target.

Does long context cost extra on these models?

On two of the three. Gemini 3.1 Pro doubles input to $4.00 and raises output to $18.00 above 200K prompt tokens, and GPT-5.4 input doubles to $5.00 above its 272K standard window per third-party documentation. DeepSeek V4 Pro lists flat pricing across its full 1M context (all verified June 11, 2026).

Sources

OpenAI API pricing - GPT-5.4 family pricing, caching, batch, regional uplift (accessed June 11, 2026)
Gemini API pricing - Gemini 3.1 Pro Preview tiers, caching, batch (accessed June 11, 2026)
DeepSeek API pricing - V4 Pro and V4 Flash pricing, context, concurrency, deprecations (accessed June 11, 2026)
Google: Gemini 3.1 Pro announcement - release date, ARC-AGI-2, availability (accessed June 11, 2026)
nxcode: GPT-5.4 complete guide - GPT-5.4 benchmarks, variants, context tiers (accessed June 11, 2026)
nxcode: Gemini 3.1 Pro complete guide - Gemini 3.1 Pro benchmark set (accessed June 11, 2026)
DataCamp: DeepSeek V4 - V4 specs, benchmarks, positioning (accessed June 11, 2026)
Artificial Analysis: DeepSeek V4 Pro and V4 Flash - Intelligence Index, GDPval-AA, token usage (accessed June 11, 2026)