Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Q: How do I make Claude Fable 5 respond faster?

Lower the `effort` parameter - Anthropic calls it the primary control for trading off intelligence, latency, and cost on Fable 5. Medium or low effort cuts thinking depth and total token spend while remaining strong on routine tasks. Streaming with summarized thinking improves perceived latency, and prompt caching trims repeated large contexts. Thinking cannot be disabled entirely.

Last updated: June 11, 2026

Every ranking comparison published since Claude Fable 5 launched on June 9 leads with benchmark scores, pricing, and context window. Almost none mention latency - strange, because it is the first thing you notice in actual use. Simon Willison, after roughly 5.5 hours of release-day testing, summed it up in one line: "It's slow, expensive and has been quite happily churning through everything I've thrown at it so far."

So is Claude Fable 5 slow? Yes in one specific dimension, no in another, and whether either matters depends on how you deploy it. This post puts measured numbers on the question and builds a routing guide for when the wait is worth it and when to hand the request to Opus 4.8 or Sonnet 4.6 instead.

The Measured Numbers: Fable 5 vs Opus 4.8 vs Sonnet 4.6

Artificial Analysis benchmarks models through the live Anthropic API and publishes two distinct speed metrics: output speed (tokens per second once the model is generating) and time to first answer token (TTFT), which for reasoning models includes thinking time.

Here is the head-to-head from the Artificial Analysis model pages, accessed June 11, 2026. Note the configurations: Fable 5 and Opus 4.8 were tested with adaptive reasoning at max effort, Sonnet 4.6 in its non-reasoning configuration at high effort.

Metric	Claude Fable 5 (max effort)	Claude Opus 4.8 (max effort)	Claude Sonnet 4.6 (non-reasoning)
Output speed	63.4 tokens/sec	59.8 tokens/sec	42.4 tokens/sec
Time to first answer token	109.12s	60.92s	1.44s
Tier median TTFT	2.66s	2.66s	1.56s
Input / output price per MTok	$10 / $50	$5 / $25	$3 / $15

Two things jump out. First, Fable 5's raw generation speed is fine - 63.4 tokens per second is above the tier median of 62.7 and slightly faster than Opus 4.8's 59.8. Second, the 109-second time to first answer token is enormous: roughly 41x the tier median, and about 1.8x Opus 4.8's already-long 60.92 seconds at the same max-effort setting. By Artificial Analysis's end-to-end math (TTFT plus generation time for a 500-token answer), that is roughly two minutes per response at max effort, against under a second and a half before Sonnet 4.6 starts answering.

Where the Latency Actually Comes From

The 109 seconds is not network overhead or slow inference. It is thinking time. Fable 5 runs adaptive thinking always on - thinking: {"type": "disabled"} is not supported - and decides how deeply to reason based on the task and the effort setting. Artificial Analysis benchmarks the max-effort configuration, so 109 seconds is the ceiling case, not the typical request. The API default is high, one notch down, and Anthropic's effort documentation calls effort "the primary control for trading off intelligence, latency, and cost on Claude Fable 5."

Willison's release-day pelican benchmark shows how effort scales the work. The same one-line SVG prompt produced 1,929 output tokens at low effort, 2,290 at medium, 2,057 at high, 5,992 at xhigh, and 14,430 at max - about 7.5x the tokens of low effort, and at 63 tokens per second, every extra token is wall-clock time.

So the precise answer to "is Fable 5 slow" is: the model thinks long before it speaks, and how long is substantially under your control. What you cannot do is turn thinking off entirely.

When Slow Is Completely Fine

For the workloads Fable 5 was actually built for, time to first token is close to irrelevant.

Long autonomous turns. Anthropic positions Fable 5 as its most capable model for "the most demanding reasoning and long-horizon agentic work." In an agent loop that runs 40 minutes and makes 200 tool calls, a 30-second pause before each reasoning step disappears into the run time. First-attempt success is what matters, because a failed run costs the elapsed time plus the retry. Our Fable 5 vs Opus 4.8 decision guide covers the quality side of that bet.

Overnight and unattended runs. If the agent kicks off at 11 pm and you read the results at 8 am, the difference between a 2-second and a 109-second first token is exactly zero. This is where Fable 5's profile - slow to start, thorough, strong over long horizons - is purely an asset. The overnight agents workflow post covers structuring these runs, and long-running agents need harnesses covers the guardrails.

Work measured in days, not seconds. Willison's most telling anecdote is not a latency number. He handed Fable 5 a feature for his Datasette Agent project; it shipped the feature plus four supporting improvements to his underlying LLM library. His verdict: "I spent several hours on it today, but it feels like several days' worth of work." When the unit of value is days of engineering compressed into hours, a model that takes minutes per turn is fast, not slow.

Batch and async pipelines. Anything queued - report generation, codebase audits, scheduled analysis jobs - tolerates arbitrary first-token latency. Cost matters more than speed here; see our Fable 5 pricing and cost-per-task breakdown.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Managing a Fleet of Claude Agents: A Practical Guide

Jun 11, 2026 • 10 min read

Migrating Off Retired GPT Models in 2026: A Working Checklist

Jun 11, 2026 • 10 min read

Rewriting Your Prompts and Skills for Fable 5

Jun 11, 2026 • 10 min read

Ultracode: Claude Code Multi-Agent Orchestration Mode Explained

Jun 11, 2026 • 8 min read

When Latency Hurts: Route to Opus 4.8 or Sonnet 4.6

The flip side is just as clear: some workloads treat a long time to first token as a dealbreaker, not an inconvenience.

Interactive chat and user-facing products. No user waits 109 seconds, or even 20, staring at a spinner. Sonnet 4.6 in non-reasoning mode starts streaming in 1.44 seconds. If your product surfaces model output to users in real time, Fable 5 at high effort is the wrong default, full stop.

Tight developer loops. Pair-programming sessions, quick refactors, "explain this error" queries - the answer's value decays with every second of waiting. Anthropic's own models overview rates comparative latency as Moderate for Opus 4.8, Fast for Sonnet 4.6, and Fastest for Haiku 4.5. Fable 5 does not appear in that latency table at all.

High-throughput parallel work. Fan-out workloads multiply latency by volume. A subagent that takes two minutes to start answering blocks the whole orchestration graph. Anthropic's effort docs recommend low effort for subagent roles, and a cheaper, faster model is usually the better fit entirely.

Persona Decision Guide

You are	Your latency tolerance	Route to
Agent builder running multi-hour autonomous tasks	Very high	Fable 5 at high or xhigh effort
Engineer kicking off overnight migrations or audits	Effectively infinite	Fable 5 at xhigh or max effort
Developer in an interactive coding session	Low (seconds)	Opus 4.8, or Fable 5 at medium effort if quality demands it
Product team shipping user-facing chat	Very low (sub-2s first token)	Sonnet 4.6, Haiku 4.5 for the fastest paths
Pipeline operator running high-volume batch jobs	High, but cost-bound	Sonnet 4.6 or Opus 4.8; Fable 5 only for steps that fail on cheaper models

Tuning Fable 5 Latency Without Switching Models

If the task needs Fable 5's capability but the default feels sluggish, you have levers before reaching for a different model.

Drop the effort level. This is the official, first-line control. Anthropic's guidance: "Reduce effort if a task completes but takes longer than necessary, or if you want a faster, more interactive working style." Lower effort settings on Fable 5 "still perform well and often exceed xhigh performance on prior models," and effort affects all token spend including tool calls. Our Fable 5 effort levels guide walks through each setting.

Stream everything. With streaming, perceived latency is time to first visible token, not time to completion. Summarized thinking output (thinking.display: "summarized") lets users see reasoning progress while the answer forms - very different from a dead spinner.

Cache aggressively. Prompt cache hits are billed at $1 per million tokens. Caching does not shorten thinking time, but it trims the input side of repeated long-context calls, which compounds across an agent session.

Set realistic timeouts. At max effort, a 500-token answer lands around the two-minute mark by Artificial Analysis's measurements. Client timeouts tuned for earlier model generations will kill healthy Fable 5 requests mid-thought.

When to Skip Fable 5 Entirely

The honest version: most workloads should not be on Fable 5 at all. Skip it if your workload is latency-sensitive and user-facing - no effort setting gets Fable 5 to Sonnet 4.6's 1.44-second first token, because adaptive thinking cannot be disabled. Skip it if a cheaper model already succeeds reliably; at $10/$50 per million tokens against Opus 4.8's $5/$25, you are paying double for capability you are not using. And skip it if the 30-day data retention requirement or the safeguard classifiers conflict with your domain - covered in our migration guide.

Fable 5 is a specialist. It is the model you route to when the task is hard enough that quality dominates every other variable, and it rewards deployment patterns where nobody is watching the clock.

FAQ

Is Claude Fable 5 slow?

In time to first token, yes: Artificial Analysis measures 109.12 seconds to first answer token at max effort, versus a 2.66-second median for reasoning models in its price tier. In generation speed, no: 63.4 output tokens per second is above the tier median. The delay is thinking time, which scales with the effort setting, so lower-effort requests start substantially faster than the max-effort benchmark figure.

What is Claude Fable 5's latency compared to Opus 4.8 and Sonnet 4.6?

At the same max-effort configuration, Claude Fable 5 latency is 109.12 seconds to first answer token versus 60.92 seconds for Opus 4.8. Sonnet 4.6 in its non-reasoning configuration starts answering in 1.44 seconds. Output speeds once generating are 63.4, 59.8, and 42.4 tokens per second respectively (Artificial Analysis, accessed June 11, 2026).

How do I make Claude Fable 5 respond faster?

Lower the effort parameter - Anthropic calls it the primary control for trading off intelligence, latency, and cost on Fable 5. Medium or low effort cuts thinking depth and total token spend while remaining strong on routine tasks. Streaming with summarized thinking improves perceived latency, and prompt caching trims repeated large contexts. Thinking cannot be disabled entirely.

Does Fable 5's latency matter for agentic workloads?

Usually not. In long autonomous turns, overnight runs, and batch pipelines, time to first token is a rounding error against total run time, and first-attempt success matters far more. Latency matters when a human or downstream system is actively waiting on each response - interactive chat, tight dev loops, and high-volume fan-out work belong on Opus 4.8, Sonnet 4.6, or Haiku 4.5.

Sources

Artificial Analysis: Claude Fable 5 - speed, TTFT, pricing. Accessed June 11, 2026.
Artificial Analysis: Claude Opus 4.8 - speed and latency figures. Accessed June 11, 2026.
Artificial Analysis: Claude Sonnet 4.6 - speed and latency figures. Accessed June 11, 2026.
Simon Willison: Initial impressions of Claude Fable 5 - release-day testing, effort-level token counts. Published June 9, 2026; accessed June 11, 2026.
Anthropic docs: Effort - effort levels and latency guidance. Accessed June 11, 2026.
Anthropic docs: Models overview - comparative latency ratings and pricing. Accessed June 11, 2026.
Anthropic docs: Introducing Claude Fable 5 and Claude Mythos 5 - adaptive thinking behavior. Accessed June 11, 2026.

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

The Measured Numbers: Fable 5 vs Opus 4.8 vs Sonnet 4.6

Where the Latency Actually Comes From

When Slow Is Completely Fine

Managing a Fleet of Claude Agents: A Practical Guide

Migrating Off Retired GPT Models in 2026: A Working Checklist

Rewriting Your Prompts and Skills for Fable 5

Ultracode: Claude Code Multi-Agent Orchestration Mode Explained

When Latency Hurts: Route to Opus 4.8 or Sonnet 4.6

Persona Decision Guide

Tuning Fable 5 Latency Without Switching Models

When to Skip Fable 5 Entirely

FAQ

Is Claude Fable 5 slow?

What is Claude Fable 5's latency compared to Opus 4.8 and Sonnet 4.6?

How do I make Claude Fable 5 respond faster?

Does Fable 5's latency matter for agentic workloads?

Sources

Related Tools

Claude Haiku 4.5

Claude Fable 5

Zed

Claude

Related Guides

Writing Your First Claude Code Skill

Auto Compaction - Claude Code

Skill Invocation - Claude Code

Related Videos

Claude Fable 5 in 7 Minutes

Related Posts

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Fable 5 with 1M Context: What Actually Works in Practice

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Get Smarter About AI Dev

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

The Measured Numbers: Fable 5 vs Opus 4.8 vs Sonnet 4.6

Where the Latency Actually Comes From

When Slow Is Completely Fine

Managing a Fleet of Claude Agents: A Practical Guide

Migrating Off Retired GPT Models in 2026: A Working Checklist

Rewriting Your Prompts and Skills for Fable 5

Ultracode: Claude Code Multi-Agent Orchestration Mode Explained

When Latency Hurts: Route to Opus 4.8 or Sonnet 4.6

Persona Decision Guide

Tuning Fable 5 Latency Without Switching Models

When to Skip Fable 5 Entirely

FAQ

Is Claude Fable 5 slow?

What is Claude Fable 5's latency compared to Opus 4.8 and Sonnet 4.6?

How do I make Claude Fable 5 respond faster?

Does Fable 5's latency matter for agentic workloads?

Sources

Related Tools

Claude Haiku 4.5

Claude Fable 5

Zed

Claude

Related Guides

Writing Your First Claude Code Skill

Auto Compaction - Claude Code

Skill Invocation - Claude Code

Related Videos

Claude Fable 5 in 7 Minutes

Related Posts

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Fable 5 with 1M Context: What Actually Works in Practice

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Get Smarter About AI Dev