TL;DR
Claude Fable 5 latency measured: 109 seconds to first token at max effort vs 1.4s for Sonnet 4.6. When slow is fine, when it hurts, and how to route around it.
Read next
Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.
10 min readA practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
10 min readTask budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.
8 min readLast updated: June 11, 2026
Every ranking comparison published since Claude Fable 5 launched on June 9 leads with benchmark scores, pricing, and context window. Almost none mention latency - strange, because it is the first thing you notice in actual use. Simon Willison, after roughly 5.5 hours of release-day testing, summed it up in one line: "It's slow, expensive and has been quite happily churning through everything I've thrown at it so far."
So is Claude Fable 5 slow? Yes in one specific dimension, no in another, and whether either matters depends on how you deploy it. This post puts measured numbers on the question and builds a routing guide for when the wait is worth it and when to hand the request to Opus 4.8 or Sonnet 4.6 instead.
Artificial Analysis benchmarks models through the live Anthropic API and publishes two distinct speed metrics: output speed (tokens per second once the model is generating) and time to first answer token (TTFT), which for reasoning models includes thinking time.
Here is the head-to-head from the Artificial Analysis model pages, accessed June 11, 2026. Note the configurations: Fable 5 and Opus 4.8 were tested with adaptive reasoning at max effort, Sonnet 4.6 in its non-reasoning configuration at high effort.
| Metric | Claude Fable 5 (max effort) | Claude Opus 4.8 (max effort) | Claude Sonnet 4.6 (non-reasoning) |
|---|---|---|---|
| Output speed | 63.4 tokens/sec | 59.8 tokens/sec | 42.4 tokens/sec |
| Time to first answer token | 109.12s | 60.92s | 1.44s |
| Tier median TTFT | 2.66s | 2.66s | 1.56s |
| Input / output price per MTok | $10 / $50 | $5 / $25 | $3 / $15 |
Two things jump out. First, Fable 5's raw generation speed is fine - 63.4 tokens per second is above the tier median of 62.7 and slightly faster than Opus 4.8's 59.8. Second, the 109-second time to first answer token is enormous: roughly 41x the tier median, and about 1.8x Opus 4.8's already-long 60.92 seconds at the same max-effort setting. By Artificial Analysis's end-to-end math (TTFT plus generation time for a 500-token answer), that is roughly two minutes per response at max effort, against under a second and a half before Sonnet 4.6 starts answering.
The 109 seconds is not network overhead or slow inference. It is thinking time. Fable 5 runs adaptive thinking always on - thinking: {"type": "disabled"} is not supported - and decides how deeply to reason based on the task and the effort setting. Artificial Analysis benchmarks the max-effort configuration, so 109 seconds is the ceiling case, not the typical request. The API default is high, one notch down, and Anthropic's effort documentation calls effort "the primary control for trading off intelligence, latency, and cost on Claude Fable 5."
Willison's release-day pelican benchmark shows how effort scales the work. The same one-line SVG prompt produced 1,929 output tokens at low effort, 2,290 at medium, 2,057 at high, 5,992 at xhigh, and 14,430 at max - about 7.5x the tokens of low effort, and at 63 tokens per second, every extra token is wall-clock time.
So the precise answer to "is Fable 5 slow" is: the model thinks long before it speaks, and how long is substantially under your control. What you cannot do is turn thinking off entirely.
For the workloads Fable 5 was actually built for, time to first token is close to irrelevant.
Long autonomous turns. Anthropic positions Fable 5 as its most capable model for "the most demanding reasoning and long-horizon agentic work." In an agent loop that runs 40 minutes and makes 200 tool calls, a 30-second pause before each reasoning step disappears into the run time. First-attempt success is what matters, because a failed run costs the elapsed time plus the retry. Our Fable 5 vs Opus 4.8 decision guide covers the quality side of that bet.
Overnight and unattended runs. If the agent kicks off at 11 pm and you read the results at 8 am, the difference between a 2-second and a 109-second first token is exactly zero. This is where Fable 5's profile - slow to start, thorough, strong over long horizons - is purely an asset. The overnight agents workflow post covers structuring these runs, and long-running agents need harnesses covers the guardrails.
Work measured in days, not seconds. Willison's most telling anecdote is not a latency number. He handed Fable 5 a feature for his Datasette Agent project; it shipped the feature plus four supporting improvements to his underlying LLM library. His verdict: "I spent several hours on it today, but it feels like several days' worth of work." When the unit of value is days of engineering compressed into hours, a model that takes minutes per turn is fast, not slow.
Batch and async pipelines. Anything queued - report generation, codebase audits, scheduled analysis jobs - tolerates arbitrary first-token latency. Cost matters more than speed here; see our Fable 5 pricing and cost-per-task breakdown.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
The flip side is just as clear: some workloads treat a long time to first token as a dealbreaker, not an inconvenience.
Interactive chat and user-facing products. No user waits 109 seconds, or even 20, staring at a spinner. Sonnet 4.6 in non-reasoning mode starts streaming in 1.44 seconds. If your product surfaces model output to users in real time, Fable 5 at high effort is the wrong default, full stop.
Tight developer loops. Pair-programming sessions, quick refactors, "explain this error" queries - the answer's value decays with every second of waiting. Anthropic's own models overview rates comparative latency as Moderate for Opus 4.8, Fast for Sonnet 4.6, and Fastest for Haiku 4.5. Fable 5 does not appear in that latency table at all.
High-throughput parallel work. Fan-out workloads multiply latency by volume. A subagent that takes two minutes to start answering blocks the whole orchestration graph. Anthropic's effort docs recommend low effort for subagent roles, and a cheaper, faster model is usually the better fit entirely.
| You are | Your latency tolerance | Route to |
|---|---|---|
| Agent builder running multi-hour autonomous tasks | Very high | Fable 5 at high or xhigh effort |
| Engineer kicking off overnight migrations or audits | Effectively infinite | Fable 5 at xhigh or max effort |
| Developer in an interactive coding session | Low (seconds) | Opus 4.8, or Fable 5 at medium effort if quality demands it |
| Product team shipping user-facing chat | Very low (sub-2s first token) | Sonnet 4.6, Haiku 4.5 for the fastest paths |
| Pipeline operator running high-volume batch jobs | High, but cost-bound | Sonnet 4.6 or Opus 4.8; Fable 5 only for steps that fail on cheaper models |
If the task needs Fable 5's capability but the default feels sluggish, you have levers before reaching for a different model.
Drop the effort level. This is the official, first-line control. Anthropic's guidance: "Reduce effort if a task completes but takes longer than necessary, or if you want a faster, more interactive working style." Lower effort settings on Fable 5 "still perform well and often exceed xhigh performance on prior models," and effort affects all token spend including tool calls. Our Fable 5 effort levels guide walks through each setting.
Stream everything. With streaming, perceived latency is time to first visible token, not time to completion. Summarized thinking output (thinking.display: "summarized") lets users see reasoning progress while the answer forms - very different from a dead spinner.
Cache aggressively. Prompt cache hits are billed at $1 per million tokens. Caching does not shorten thinking time, but it trims the input side of repeated long-context calls, which compounds across an agent session.
Set realistic timeouts. At max effort, a 500-token answer lands around the two-minute mark by Artificial Analysis's measurements. Client timeouts tuned for earlier model generations will kill healthy Fable 5 requests mid-thought.
The honest version: most workloads should not be on Fable 5 at all. Skip it if your workload is latency-sensitive and user-facing - no effort setting gets Fable 5 to Sonnet 4.6's 1.44-second first token, because adaptive thinking cannot be disabled. Skip it if a cheaper model already succeeds reliably; at $10/$50 per million tokens against Opus 4.8's $5/$25, you are paying double for capability you are not using. And skip it if the 30-day data retention requirement or the safeguard classifiers conflict with your domain - covered in our migration guide.
Fable 5 is a specialist. It is the model you route to when the task is hard enough that quality dominates every other variable, and it rewards deployment patterns where nobody is watching the clock.
In time to first token, yes: Artificial Analysis measures 109.12 seconds to first answer token at max effort, versus a 2.66-second median for reasoning models in its price tier. In generation speed, no: 63.4 output tokens per second is above the tier median. The delay is thinking time, which scales with the effort setting, so lower-effort requests start substantially faster than the max-effort benchmark figure.
At the same max-effort configuration, Claude Fable 5 latency is 109.12 seconds to first answer token versus 60.92 seconds for Opus 4.8. Sonnet 4.6 in its non-reasoning configuration starts answering in 1.44 seconds. Output speeds once generating are 63.4, 59.8, and 42.4 tokens per second respectively (Artificial Analysis, accessed June 11, 2026).
Lower the effort parameter - Anthropic calls it the primary control for trading off intelligence, latency, and cost on Fable 5. Medium or low effort cuts thinking depth and total token spend while remaining strong on routine tasks. Streaming with summarized thinking improves perceived latency, and prompt caching trims repeated large contexts. Thinking cannot be disabled entirely.
Usually not. In long autonomous turns, overnight runs, and batch pipelines, time to first token is a rounding error against total run time, and first-attempt success matters far more. Latency matters when a human or downstream system is actively waiting on each response - interactive chat, tight dev loops, and high-volume fan-out work belong on Opus 4.8, Sonnet 4.6, or Haiku 4.5.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolAnthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...
View ToolHigh-performance code editor built in Rust with native AI integration. Sub-millisecond input latency. Built-in assistant...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting StartedBackground context summarization when the window starts filling up.
Claude CodeTrigger with /skillname or let Claude auto-load when relevant.
Claude CodeAnthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is wha...
Fable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the hon...
Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level...
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost ma...
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you...
Fable 5 long-running requests can run for many minutes per turn and hours per autonomous run. Here is how to configure c...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.