TL;DR
Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per million tokens, plus the three caveats that change the math.
Direct answer
Same-day-verified llm api pricing june 2026: Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 compared per million tokens, plus the three caveats that change the math.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
Claude Fable 5 vs Gemini: how Anthropic's $10/$50 Mythos-class model compares to Gemini 3.1 Pro's $2/$12 preview on pricing, context, and benchmarks.
8 min readFable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-outcome math that actually decides whether the upgrade pays.
8 min readGPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier models that carry most production traffic.
8 min readLast updated: June 11, 2026
Sticker prices for model APIs have never been this spread out. At the top of the June 2026 table, GPT-5.5-pro charges $180 per million output tokens. At the bottom, DeepSeek V4 Flash charges $0.28. That is a 643x gap between two models you can call with nearly identical OpenAI-style request bodies.
This post is the raw API rate card: every current frontier and workhorse model from Anthropic, OpenAI, Google, and DeepSeek, with every number pulled from the live first-party pricing page and stamped with a verification date. If you are pricing coding tool subscriptions like Claude Code or Cursor plans instead, that is a different market with different math - see the companion post on AI coding tool pricing.
Three structural observations matter more than any single row: the DeepSeek output floor, Anthropic dropping its long-context premium while Gemini kept tiering, and the Claude tokenizer change that quietly skews per-MTok comparisons.
All prices in USD per million tokens (MTok), standard synchronous API, verified June 11, 2026 against each provider's live pricing page (linked per section below).
| Model | Input | Cached input read | Output |
|---|---|---|---|
| Claude Fable 5 | $10.00 | $1.00 | $50.00 |
| Claude Opus 4.8 | $5.00 | $0.50 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $0.30 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $0.10 | $5.00 |
| GPT-5.5-pro | $30.00 | n/a | $180.00 |
| GPT-5.5 | $5.00 | $0.50 | $30.00 |
| GPT-5.4 | $2.50 | $0.25 | $15.00 |
| GPT-5.4-mini | $0.75 | $0.075 | $4.50 |
| GPT-5.4-nano | $0.20 | $0.02 | $1.25 |
| Gemini 3.1 Pro Preview | $2.00 (over 200K: $4.00) | $0.20 (over 200K: $0.40) plus storage | $12.00 (over 200K: $18.00) |
| Gemini 3.5 Flash | $1.50 | $0.15 plus storage | $9.00 |
| Gemini 3 Flash Preview | $0.50 | $0.05 plus storage | $3.00 |
| DeepSeek V4 Pro | $0.435 | $0.003625 | $0.87 |
| DeepSeek V4 Flash | $0.14 | $0.0028 | $0.28 |
Source pages, all verified June 11, 2026:
A few quick reads off the table. DeepSeek V4 Flash output at $0.28 is roughly 178x cheaper than Claude Fable 5 output at $50, and roughly 107x cheaper than GPT-5.5 at $30 (all verified June 11, 2026 on the pages above). Claude Opus 4.8 and GPT-5.5 are at parity on input at $5, with Claude cheaper on output ($25 vs $30). And Gemini caching is the odd one out: it bills a per-token cache rate plus an hourly storage fee ($1.00 per MTok per hour on the Flash models, $4.50 on 3.1 Pro Preview), so a cache you hold but rarely hit can cost more than no cache.
Anthropic's pricing page carries a note that is easy to skim past: Opus 4.7 and later models, including Fable 5, use a new tokenizer that "may use up to 35% more tokens for the same fixed text" (Anthropic pricing docs, verified June 11, 2026). The models overview puts the typical figure at roughly 30% (verified June 11, 2026).
This matters for cross-provider comparisons. Fable 5 versus Opus 4.8 is apples to apples, since both use the new tokenizer. But if you compare either against Sonnet 4.5-era baselines, or against another provider using your old token counts, the same prompt now consumes up to a third more billable tokens. The honest comparison requires re-counting your actual prompts with Anthropic's token counting endpoint, not multiplying old counts by new rates. For workload-level math, see Fable 5 production cost modeling.
Claude Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 now include the full 1M token context window at standard pricing. Anthropic's pricing page states it plainly: a 900K-token request bills at the same per-token rate as a 9K-token request, and caching and batch discounts apply at standard rates across the full window (verified June 11, 2026).
Gemini still tiers. Gemini 3.1 Pro Preview doubles input from $2 to $4 and lifts output from $12 to $18 once a prompt crosses 200K tokens, and the cache rate doubles too (Gemini API pricing, verified June 11, 2026). Gemini 2.5 Pro tiers the same way at the same boundary. So the headline "Gemini Pro is cheaper than Sonnet" claim flips for genuinely long-context work: above 200K tokens, Gemini 3.1 Pro input ($4) exceeds Sonnet 4.6 input ($3), which does not tier at all. OpenAI tiers as well, framed as separate long-context rates: GPT-5.5 rises from $5 input / $30 output to $10 / $45 in the long-context band, and GPT-5.5-pro goes from $30 / $180 to $60 / $270 (OpenAI pricing, verified June 11, 2026). Of the four providers, only Anthropic and DeepSeek charge one flat rate across the full window.
All four providers discount aggressively, but the shapes are different:
If your workload is a chat or agent loop that resends a large stable prefix every turn, effective cost is dominated by the cache read rate, not the headline input rate. That reshuffles the table substantially in DeepSeek's and Anthropic's favor.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Matching models by intended tier rather than by name, verified June 11, 2026:
| Tier | Cheapest | Mid | Premium |
|---|---|---|---|
| Max capability | - | Claude Fable 5 ($10 / $50) | GPT-5.5-pro and GPT-5.4-pro ($30 / $180) |
| Frontier workhorse | Gemini 3.1 Pro Preview ($2 / $12 under 200K) | Claude Opus 4.8 ($5 / $25) | GPT-5.5 ($5 / $30) |
| Production mid-tier | DeepSeek V4 Pro ($0.435 / $0.87) | GPT-5.4 ($2.50 / $15) | Claude Sonnet 4.6 ($3 / $15) |
| Fast and cheap | DeepSeek V4 Flash ($0.14 / $0.28) | GPT-5.4-nano ($0.20 / $1.25) | Claude Haiku 4.5 ($1 / $5) |
The striking row is max capability. Fable 5 at $10/$50 is the cheapest model in its claimed tier by a wide margin: GPT-5.5-pro costs 3x more on input and 3.6x more on output. Whether Fable 5 and GPT-5.5-pro actually belong in the same tier is a benchmark question, not a pricing one, but the price positioning is unambiguous. There is a deeper dive in the Fable 5 cost-per-task analysis.
Side projects and prototypes. DeepSeek V4 Flash ($0.14 / $0.28) or Gemini 3 Flash Preview ($0.50 / $3.00, with a free tier) are hard to argue with. GPT-5.4-nano is the cheapest input rate among the closed providers at $0.20. At these prices, model choice is a quality decision, not a budget one. The open-weights angle adds another dimension - see notes on DeepSeek's open-weights economics.
Production apps with steady traffic. The mid-tier triangle is GPT-5.4 ($2.50 / $15), Sonnet 4.6 ($3 / $15), and Gemini 3.1 Pro Preview ($2 / $12 under 200K). Identical or near-identical output rates mean the decision comes down to input volume, caching shape, and evals. If your prompts regularly exceed 200K tokens, Gemini's tier flip moves it from cheapest to most expensive of the three.
Agentic and long-horizon workloads. Output tokens dominate agent spend, and output is where the providers diverge most. Opus 4.8 ($25 output) undercuts GPT-5.5 ($30) at the frontier workhorse tier. Fable 5 at $50 output only pays off when its quality reduces retries and total tokens. Batch APIs (50% off at Anthropic, OpenAI, and Google) are the easiest structural saving for any agent pipeline that is not latency-sensitive.
Teams routing across providers. With four providers publishing OpenAI-compatible or near-compatible APIs and a 643x output price spread, routing cheap-by-default with escalation on failure is increasingly the rational architecture. The LLM router comparison covers the tooling.
Compliance-constrained teams. Residency costs extra everywhere: Anthropic applies a 1.1x multiplier for US-only inference on Opus 4.6, Sonnet 4.6, and later models, and OpenAI charges a 10% uplift for regional data-residency processing on models released on or after March 5, 2026 (both verified June 11, 2026 on the pricing pages above).
Cheaper sticker prices do not automatically mean cheaper bills, and there are real reasons to stay put:
deepseek-chat and deepseek-reasoner names retire on July 24, 2026 in favor of the V4 model IDs (verified June 11, 2026), so budget a small migration even if you stay.If none of those apply and your evals show parity, the spread in this table is too large to ignore.
DeepSeek V4 Flash, at $0.14 per million input tokens (cache miss) and $0.28 per million output tokens, verified June 11, 2026 on DeepSeek's official pricing page. Cache hits drop input to $0.0028. Among US closed providers, GPT-5.4-nano has the lowest input rate at $0.20 and Gemini 3 Flash Preview has a free tier.
No. Claude Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard per-token rates, with no premium above 200K tokens (verified June 11, 2026). Gemini 3.1 Pro Preview and Gemini 2.5 Pro still charge higher rates for prompts above 200K tokens.
Fable 5 costs $10 input / $50 output per MTok versus GPT-5.5 at $5 / $30, so 2x on input and about 1.67x on output. Against GPT-5.5-pro ($30 / $180), Fable 5 is actually the cheaper max-capability option by 3x on input and 3.6x on output. All figures verified June 11, 2026.
Claude models from Opus 4.7 onward, including Fable 5, use a new tokenizer that can produce up to 35% more tokens for the same text compared with older Claude models, per Anthropic's pricing documentation (verified June 11, 2026). Token counts measured on older models do not transfer, so per-MTok rate comparisons against pre-4.7 baselines undercount real spend.
Anthropic, OpenAI, and Google all offer a 50% batch discount on input and output tokens (verified June 11, 2026). DeepSeek's pricing page lists no batch tier, but its standard rates sit below the other providers' batch rates in most tiers anyway.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Open-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View ToolOpen-source AI coding agent for terminal, desktop, and IDE. Works with 75+ LLM providers including Claude, GPT, Gemini,...
View ToolGoogle's frontier model family. Gemini 2.5 Pro has 1M token context and top-tier coding benchmarks. Gemini 3 Pro pushes...
View ToolAnthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppBeat the August 2026 Assistants API sunset. Paste old code, get Responses API.
View AppUse opus, sonnet, haiku, and best to switch models easily.
Claude CodeInteractive UI to switch models and effort sliders mid-session.
Claude CodeAdd gateway or custom models to the picker via environment variables.
Claude Code
Claude Fable 5 Released: Benchmarks, Pricing, Availability, and Real-World Examples Anthropic has released Claude Fable 5, the first general-use “Mythos class” model, and the video reviews the announ...

Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Claude Design by Anthropic: Generate a Design System From Your Repo + Build High-Fidelity UI Fast The video reviews Claude Design by Anthropic, calling it a highly differentiated product, and demonst...
Claude Code fast mode pricing explained: $10/$50 per MTok on Opus 4.8, the first-enable context charge, separate rate li...
Claude Fable 5 vs Gemini: how Anthropic's $10/$50 Mythos-class model compares to Gemini 3.1 Pro's $2/$12 preview on pric...
GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier mode...
GPT-5.5 vs Claude Opus 4.8: both cost $5 per million input tokens, so the workhorse-tier decision comes down to output p...
Anthropic broke its own naming ladder when it introduced the Mythos class and Claude Fable 5. Here is what the shift mea...
Anthropic gave subscribers two weeks of free Fable 5 access, then it moves to usage credits. Here's what's actually chan...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.