Prompt Caching Economics on Fable 5: When the 5-Minute TTL Pays

Q: How do I verify the cache is actually saving money?

Check the `usage` object on every response: `cache_read_input_tokens` is what you paid 0.1x for, `cache_creation_input_tokens` is what you paid the write premium for, and `input_tokens` covers only tokens after your last breakpoint. If reads stay at zero across repeated requests, a silent invalidator is changing your prefix - our [production caching guide](/blog/prompt-caching-claude-api-production-guide) walks the audit.

Developers Digest•June 11, 2026•10 min read

Claude API Prompt Caching Cost Optimization Anthropic

The Fable 5 Moment

30 parts

Previous in seriesThe Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Next in seriesFable 5 with 1M Context: What Actually Works in Practice

TL;DR

Fable 5 prompt caching economics: cache-write vs cache-read pricing, 5-minute vs 1-hour TTL break-even math, and worked agent-loop examples.

Claude Fable 5 API: Production Integration Patterns, Rate Limits, and Migration Gotchas

Everything you need to ship Claude Fable 5 in production - from the API surface changes and adaptive thinking defaults to rate limit strategy, streaming latency, and the June 15 deprecation deadline for older models.

9 min read

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.

10 min read

Prompt Caching in the Claude API: A Production Guide

Cut Claude API spend by up to 90% with prompt caching. Real numbers, TypeScript SDK code, and the gotchas Anthropic's docs gloss over.

11 min read

Last updated: June 11, 2026

Claude Fable 5 is the most expensive current-generation model Anthropic ships - $10 per million input tokens and $50 per million output, double Opus 4.8 across the board. At those rates, prompt caching stops being a nice optimization and becomes load-bearing economics. A 200K-token agent context costs $2.00 every time you re-send it cold, or $0.20 when it hits the cache. Multiply by fifty loop iterations and the gap is the difference between a $12 run and a $100 run.

We covered the mechanics - breakpoint placement, prefix invalidation, monitoring hit rate - in our prompt caching production guide. This post is purely the money math on Fable 5: what writes and reads actually cost, when the default 5-minute TTL holds, when the 1-hour TTL earns its 2x write premium, and the cadence thresholds that decide it for agent loops.

The Fable 5 Caching Rate Card

All four cache prices derive from the base input rate with fixed multipliers: 5-minute cache writes cost 1.25x base input, 1-hour writes cost 2x, and cache reads cost 0.1x. Those multipliers are identical across the lineup, so the absolute dollar gaps scale with the model (Anthropic pricing docs, accessed June 11, 2026):

Model	Base input	5m cache write	1h cache write	Cache read	Output
Claude Fable 5	$10	$12.50	$20	$1.00	$50
Claude Opus 4.8	$5	$6.25	$10	$0.50	$25
Claude Sonnet 4.6	$3	$3.75	$6	$0.30	$15
Claude Haiku 4.5	$1	$1.25	$2	$0.10	$5

All prices per million tokens (MTok). Two Fable-specific details from the prompt caching docs worth flagging up front:

The minimum cacheable prefix on Fable 5 is 512 tokens - the lowest in the current lineup (Opus 4.8 needs 1,024; Haiku 4.5 needs 4,096). Even a modest system prompt caches on Fable 5. On Amazon Bedrock the Fable 5 minimum is 1,024 tokens.
Every cache read refreshes the TTL at no extra cost. A read bills at $1/MTok and resets the clock, on both the 5-minute and 1-hour tiers. This one mechanic drives most of the cadence math below.

Break-Even: When a Cache Write Pays for Itself

The write premium is the bet. On the 5-minute tier you pay 1.25x once so that subsequent requests pay 0.1x instead of 1.0x. The docs state it plainly: caching pays off after just one cache read on the 5-minute tier, and after two reads on the 1-hour tier.

The arithmetic, in multiples of base input price for the same prefix sent twice:

Uncached, two requests: 1.0x + 1.0x = 2.0x
5-minute cache: 1.25x write + 0.1x read = 1.35x. Wins on the second request.
1-hour cache: 2.0x write + 0.1x read = 2.1x. Slightly loses on the second request, wins on the third (2.2x vs 3.0x uncached).

The flip side matters just as much: a cache write with zero reads is a pure 25% surcharge (or 100% on the 1-hour tier). If your prefix changes every request - per-user context, timestamps interpolated into the system prompt - caching on Fable 5 costs you an extra $2.50 per million tokens for nothing. The break-even is a single reuse, but the reuse has to actually happen.

Agent-Loop Cadence: Why the 5-Minute Default Usually Holds

An agent loop re-sends its entire prefix - system prompt, tool definitions, conversation history - on every iteration. The question is never whether to cache, it is whether the gap between iterations stays under the TTL. Because reads refresh the TTL for free, a loop that fires more often than every 5 minutes keeps the default cache alive indefinitely. You never pay the second write.

Most tool-driven loops are well inside that window: a shell command, a file edit, an API call, each completing in seconds to a couple of minutes. The exceptions are the ones that blow the budget - a 7-minute test suite, a human approval gate, a deploy that takes 12 minutes to roll out. Each expiry costs you a fresh write on the full prefix.

Worked example: a 50-iteration Fable 5 loop

Take a coding agent with a 200K-token prefix (large system prompt, tool schemas, repository context) running 50 iterations with tool calls every 30 to 90 seconds:

Strategy	Calculation	Prefix input cost
Uncached	50 x 200K x $10/MTok	$100.00
5m cache, no expiries	1 write ($2.50) + 49 reads ($0.20 each)	$12.30
5m cache, 5 expiries mid-run	6 writes ($15.00) + 44 reads ($8.80)	$23.80

The clean case saves about 88% on prefix input. Each TTL miss costs the difference between a write and a read on 200K tokens - $2.50 versus $0.20, so roughly $2.30 per expiry. Five misses nearly double the caching bill but it still beats uncached by a wide margin. In a real loop the history grows each turn, so you also pay small incremental writes for the new suffix - the table isolates the stable-prefix cost, which dominates.

That expiry delta is also the number to know for fan-outs: a cache entry only becomes readable after the first response begins, so parallel agents launched cold against the same 200K prefix each pay the $2.50 write. Ten parallel cold starts cost $25.00 in writes; staggering them behind one warm-up request costs $2.50 + 9 x $0.20 = $4.30.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Jun 11, 2026 • 8 min read

Frontier Model API Pricing, June 2026: Claude vs OpenAI vs Gemini vs DeepSeek

Jun 11, 2026 • 10 min read

The Frontier Model Landscape, June 2026 Edition

Jun 11, 2026 • 10 min read

The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

Jun 11, 2026 • 8 min read

When the 1-Hour TTL Pays

The 1-hour tier costs 0.75x base input more per write than the 5-minute tier ($20 vs $12.50 on Fable 5). Each avoided expiry saves you 1.15x - a $12.50/MTok re-write replaced by a $1.00/MTok read. So the rule of thumb is clean: if you expect even one gap between 5 and 60 minutes during a session, the 1-hour TTL pays for itself. One avoided re-write more than covers the premium.

Worked example: a human-in-the-loop session

A developer pairs with a Fable 5 assistant carrying a 100K-token prefix, sending a message roughly every 8 minutes across a 2-hour session - 15 requests total:

Strategy	Calculation	Prefix input cost
Uncached	15 x 100K x $10/MTok	$15.00
5m cache	15 writes x $1.25 (every request misses)	$18.75
1h cache	1 write ($2.00) + 14 reads ($0.10 each)	$3.40

This is the sharpest result in the whole post: with 8-minute gaps, the 5-minute cache expires before every single request, so you pay the 1.25x write premium fifteen times and read nothing. The misconfigured cache costs 25% more than not caching at all. The 1-hour tier, refreshed free on each read, runs the whole session on one write and lands 77% under the uncached cost.

So the TTL decision reduces to one measurement: your inter-request gap distribution. Under 5 minutes, take the default. Between 5 and 60 minutes, pay the 2x write. Over an hour of idle, the cache is gone either way - consider a pre-warm request or accept the cold write.

Fable-Specific Wrinkles in the Math

A few things change the numbers specifically on Fable 5 relative to the models you may be migrating from:

The tokenizer inflates every figure. Fable 5 uses the tokenizer introduced with Opus 4.7, and the same text produces roughly 30% more tokens than on pre-4.7 models per the models overview (the pricing page says up to 35% for fixed text). A prefix you measured at 150K tokens on Sonnet 4.6 is closer to 195K on Fable 5, and every write, read, and expiry scales with it. Comparisons against Opus 4.8 and 4.7 are clean - same tokenizer - but budgets carried over from older models need re-baselining, as we covered in migrating to Claude Fable 5.
Caching only touches input. Output is $50/MTok with no cache lever, and long-horizon Fable runs generate a lot of it. Caching narrows the input gap dramatically; it does nothing for the output side, which is where the 2x premium over Opus 4.8 fully applies.
Multipliers stack with the Batch API. Anthropic's pricing page confirms cache multipliers stack with the 50% batch discount, so a batch cache read on Fable 5 prices at $0.50/MTok. For non-urgent high-volume work, that combination is the floor - see our Batch API production guide.
A well-cached Fable read is cheaper than cheap models' cold input. $1/MTok for cached Fable 5 input undercuts Sonnet 4.6's $3/MTok uncached rate even after the tokenizer penalty. Cache discipline does not close the Fable premium, but it moves the bulk of input spend below mid-tier cold pricing.

For a sense of what disciplined caching looks like at scale: Simon Willison reported a single agent session that consumed 78.2 million tokens for $99.26 in his release-day writeup. That averages about $1.27 per million tokens on a $10/$50 model - a blend that only pencils out if the overwhelming majority of those tokens billed at or near the $1/MTok cache-read rate. Heavy caching is not an edge case on Fable 5; it is how real agent sessions stay affordable, and it is worth verifying with cache_read_input_tokens in your own usage data rather than assuming. The full picture of modeling these costs lives in Fable 5 production cost modeling.

FAQ

Does the 5-minute cache really expire mid-run if a tool call takes too long?

Yes. The TTL is refreshed only when the cached content is read by a new request. If your loop stalls past 5 minutes - a long test suite, a human approval step - the next request pays a full cache write ($12.50/MTok on Fable 5) instead of a read ($1.00/MTok). One expected gap in the 5-to-60-minute range is enough to justify the 1-hour TTL.

Is the 1-hour TTL ever the wrong choice?

Yes, for tight loops. If every request lands within 5 minutes of the previous one, the free refresh keeps the default cache alive forever and the 1-hour tier just doubles your first write ($20 vs $12.50 per MTok) for no benefit. It also needs two reads to break even versus one for the 5-minute tier.

How do I verify the cache is actually saving money?

Check the usage object on every response: cache_read_input_tokens is what you paid 0.1x for, cache_creation_input_tokens is what you paid the write premium for, and input_tokens covers only tokens after your last breakpoint. If reads stay at zero across repeated requests, a silent invalidator is changing your prefix - our production caching guide walks the audit.

Do Fable 5 cache prices differ on Bedrock or Vertex?

The multipliers are the same, but the minimum cacheable prefix differs: 512 tokens on the Claude API versus 1,024 on Amazon Bedrock per Anthropic's docs. Partner platforms also set their own regional pricing: on Bedrock and Vertex AI, regional and multi-region endpoints carry a 10% premium over global ones.

Does prompt caching help with Fable 5's output costs?

No. Caching applies to input tokens only. Output remains $50/MTok regardless of caching, and on long agentic runs output plus thinking is often the larger line item. Caching is the input lever; effort settings and task scoping are the output levers.

Sources

Anthropic pricing documentation - model rate card, cache multipliers, batch stacking, data residency (accessed June 11, 2026)
Anthropic prompt caching documentation - TTL refresh behavior, per-model minimum cacheable prefix, breakpoints, usage fields, concurrent-request timing (accessed June 11, 2026)
Anthropic models overview - Fable 5 specs, GA date, tokenizer note (accessed June 11, 2026)
Simon Willison: Initial impressions of Claude Fable 5 - independent pricing confirmation and the 78.2M-token session figure (accessed June 11, 2026)

Share

Suggest an editSave

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

Related Tools

AI ModelsNew

Claude Fable 5

Anthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...

View Tool

Related Guides

Guide

Prompt Caching - Claude Code

Automatic reuse of cached context for substantial cost reduction.

Claude Code

Guide

UserPromptExpansion Hook - Claude Code

Fires when a slash command expands; can block or inject context.

Claude Code

Guide

Interactive Mode - Claude Code

Real-time prompt loop with history, completions, and multiline input.

Claude Code

Claude Fable 5 API: Production Integration Patterns, Rate Limits, and Migration Gotchas

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Prompt Caching in the Claude API: A Production Guide

The Fable 5 Caching Rate Card

Break-Even: When a Cache Write Pays for Itself

Agent-Loop Cadence: Why the 5-Minute Default Usually Holds

Worked example: a 50-iteration Fable 5 loop

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Frontier Model API Pricing, June 2026: Claude vs OpenAI vs Gemini vs DeepSeek

The Frontier Model Landscape, June 2026 Edition

The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

When the 1-Hour TTL Pays

Worked example: a human-in-the-loop session

Fable-Specific Wrinkles in the Math

FAQ