The Fable 5 Moment
30 partsTL;DR
Fable 5 prompt caching economics: cache-write vs cache-read pricing, 5-minute vs 1-hour TTL break-even math, and worked agent-loop examples.
Read next
Everything you need to ship Claude Fable 5 in production - from the API surface changes and adaptive thinking defaults to rate limit strategy, streaming latency, and the June 15 deprecation deadline for older models.
9 min readA practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
10 min readCut Claude API spend by up to 90% with prompt caching. Real numbers, TypeScript SDK code, and the gotchas Anthropic's docs gloss over.
11 min readLast updated: June 11, 2026
Claude Fable 5 is the most expensive current-generation model Anthropic ships - $10 per million input tokens and $50 per million output, double Opus 4.8 across the board. At those rates, prompt caching stops being a nice optimization and becomes load-bearing economics. A 200K-token agent context costs $2.00 every time you re-send it cold, or $0.20 when it hits the cache. Multiply by fifty loop iterations and the gap is the difference between a $12 run and a $100 run.
We covered the mechanics - breakpoint placement, prefix invalidation, monitoring hit rate - in our prompt caching production guide. This post is purely the money math on Fable 5: what writes and reads actually cost, when the default 5-minute TTL holds, when the 1-hour TTL earns its 2x write premium, and the cadence thresholds that decide it for agent loops.
All four cache prices derive from the base input rate with fixed multipliers: 5-minute cache writes cost 1.25x base input, 1-hour writes cost 2x, and cache reads cost 0.1x. Those multipliers are identical across the lineup, so the absolute dollar gaps scale with the model (Anthropic pricing docs, accessed June 11, 2026):
| Model | Base input | 5m cache write | 1h cache write | Cache read | Output |
|---|---|---|---|---|---|
| Claude Fable 5 | $10 | $12.50 | $20 | $1.00 | $50 |
| Claude Opus 4.8 | $5 | $6.25 | $10 | $0.50 | $25 |
| Claude Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 |
| Claude Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 |
All prices per million tokens (MTok). Two Fable-specific details from the prompt caching docs worth flagging up front:
The write premium is the bet. On the 5-minute tier you pay 1.25x once so that subsequent requests pay 0.1x instead of 1.0x. The docs state it plainly: caching pays off after just one cache read on the 5-minute tier, and after two reads on the 1-hour tier.
The arithmetic, in multiples of base input price for the same prefix sent twice:
The flip side matters just as much: a cache write with zero reads is a pure 25% surcharge (or 100% on the 1-hour tier). If your prefix changes every request - per-user context, timestamps interpolated into the system prompt - caching on Fable 5 costs you an extra $2.50 per million tokens for nothing. The break-even is a single reuse, but the reuse has to actually happen.
An agent loop re-sends its entire prefix - system prompt, tool definitions, conversation history - on every iteration. The question is never whether to cache, it is whether the gap between iterations stays under the TTL. Because reads refresh the TTL for free, a loop that fires more often than every 5 minutes keeps the default cache alive indefinitely. You never pay the second write.
Most tool-driven loops are well inside that window: a shell command, a file edit, an API call, each completing in seconds to a couple of minutes. The exceptions are the ones that blow the budget - a 7-minute test suite, a human approval gate, a deploy that takes 12 minutes to roll out. Each expiry costs you a fresh write on the full prefix.
Take a coding agent with a 200K-token prefix (large system prompt, tool schemas, repository context) running 50 iterations with tool calls every 30 to 90 seconds:
| Strategy | Calculation | Prefix input cost |
|---|---|---|
| Uncached | 50 x 200K x $10/MTok | $100.00 |
| 5m cache, no expiries | 1 write ($2.50) + 49 reads ($0.20 each) | $12.30 |
| 5m cache, 5 expiries mid-run | 6 writes ($15.00) + 44 reads ($8.80) | $23.80 |
The clean case saves about 88% on prefix input. Each TTL miss costs the difference between a write and a read on 200K tokens - $2.50 versus $0.20, so roughly $2.30 per expiry. Five misses nearly double the caching bill but it still beats uncached by a wide margin. In a real loop the history grows each turn, so you also pay small incremental writes for the new suffix - the table isolates the stable-prefix cost, which dominates.
That expiry delta is also the number to know for fan-outs: a cache entry only becomes readable after the first response begins, so parallel agents launched cold against the same 200K prefix each pay the $2.50 write. Ten parallel cold starts cost $25.00 in writes; staggering them behind one warm-up request costs $2.50 + 9 x $0.20 = $4.30.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
The 1-hour tier costs 0.75x base input more per write than the 5-minute tier ($20 vs $12.50 on Fable 5). Each avoided expiry saves you 1.15x - a $12.50/MTok re-write replaced by a $1.00/MTok read. So the rule of thumb is clean: if you expect even one gap between 5 and 60 minutes during a session, the 1-hour TTL pays for itself. One avoided re-write more than covers the premium.
A developer pairs with a Fable 5 assistant carrying a 100K-token prefix, sending a message roughly every 8 minutes across a 2-hour session - 15 requests total:
| Strategy | Calculation | Prefix input cost |
|---|---|---|
| Uncached | 15 x 100K x $10/MTok | $15.00 |
| 5m cache | 15 writes x $1.25 (every request misses) | $18.75 |
| 1h cache | 1 write ($2.00) + 14 reads ($0.10 each) | $3.40 |
This is the sharpest result in the whole post: with 8-minute gaps, the 5-minute cache expires before every single request, so you pay the 1.25x write premium fifteen times and read nothing. The misconfigured cache costs 25% more than not caching at all. The 1-hour tier, refreshed free on each read, runs the whole session on one write and lands 77% under the uncached cost.
So the TTL decision reduces to one measurement: your inter-request gap distribution. Under 5 minutes, take the default. Between 5 and 60 minutes, pay the 2x write. Over an hour of idle, the cache is gone either way - consider a pre-warm request or accept the cold write.
A few things change the numbers specifically on Fable 5 relative to the models you may be migrating from:
For a sense of what disciplined caching looks like at scale: Simon Willison reported a single agent session that consumed 78.2 million tokens for $99.26 in his release-day writeup. That averages about $1.27 per million tokens on a $10/$50 model - a blend that only pencils out if the overwhelming majority of those tokens billed at or near the $1/MTok cache-read rate. Heavy caching is not an edge case on Fable 5; it is how real agent sessions stay affordable, and it is worth verifying with cache_read_input_tokens in your own usage data rather than assuming. The full picture of modeling these costs lives in Fable 5 production cost modeling.
Yes. The TTL is refreshed only when the cached content is read by a new request. If your loop stalls past 5 minutes - a long test suite, a human approval step - the next request pays a full cache write ($12.50/MTok on Fable 5) instead of a read ($1.00/MTok). One expected gap in the 5-to-60-minute range is enough to justify the 1-hour TTL.
Yes, for tight loops. If every request lands within 5 minutes of the previous one, the free refresh keeps the default cache alive forever and the 1-hour tier just doubles your first write ($20 vs $12.50 per MTok) for no benefit. It also needs two reads to break even versus one for the 5-minute tier.
Check the usage object on every response: cache_read_input_tokens is what you paid 0.1x for, cache_creation_input_tokens is what you paid the write premium for, and input_tokens covers only tokens after your last breakpoint. If reads stay at zero across repeated requests, a silent invalidator is changing your prefix - our production caching guide walks the audit.
The multipliers are the same, but the minimum cacheable prefix differs: 512 tokens on the Claude API versus 1,024 on Amazon Bedrock per Anthropic's docs. Partner platforms also set their own regional pricing: on Bedrock and Vertex AI, regional and multi-region endpoints carry a 10% premium over global ones.
No. Caching applies to input tokens only. Output remains $50/MTok regardless of caching, and on long agentic runs output plus thinking is often the larger line item. Caching is the input lever; effort settings and task scoping are the output levers.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Automatic reuse of cached context for substantial cost reduction.
Claude CodeFires when a slash command expands; can block or inject context.
Claude CodeReal-time prompt loop with history, completions, and multiline input.
Claude Code
Cut Claude API spend by up to 90% with prompt caching. Real numbers, TypeScript SDK code, and the gotchas Anthropic's do...
Everything you need to ship Claude Fable 5 in production - from the API surface changes and adaptive thinking defaults t...

How to ship Claude's Batch API in production. 50% cost savings, TypeScript SDK code, JSONL request format, and the async...

A production guide to Claude's extended thinking mode. Real cost math, TypeScript SDK code, and the tasks where reasonin...
Claude agents vs skills, untangled: agents are workers with their own context window, skills are instructions loaded on...
Claude Code dynamic workflows turn orchestration into a JavaScript script that runs up to 1,000 agents per run - here is...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.