TL;DR
Fable 5 long-running requests can run for many minutes per turn and hours per autonomous run. Here is how to configure client timeouts, streaming keepalive, batch polling, and background patterns so they actually finish.
Read next
Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.
8 min readTwelve documented Claude Fable 5 use patterns - agent orchestration, overnight runs, 1M-context refactors, effort tuning - each with a how-to seed and doc link.
10 min readFable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
7 min readLast updated: June 11, 2026
Claude Fable 5's most disruptive operational trait is not the price or the benchmarks. It is that a single API request can now run long enough to break every default assumption your HTTP stack makes. Anthropic's own prompting guide is blunt about it: individual requests on hard tasks "can run for many minutes at higher effort settings," and autonomous runs "can extend for hours." The same doc tells teams to adjust client timeouts, streaming, and progress indicators before migrating.
Most integration code assumes responses arrive in seconds. This guide covers the four layers that need attention when you point that code at claude-fable-5: client timeout configuration, streaming as keepalive, asynchronous polling with the Batches API, and background patterns for runs that outlive any single request.
Three facts stack against a naive non-streaming messages.create() call:
The SDKs default to a 10-minute timeout. For large max_tokens values on non-streaming requests the TypeScript SDK scales the timeout dynamically - the formula is 60 * 60 * maxTokens / 128000 seconds, a 60-minute ceiling at Fable 5's 128K output cap. The SDK also throws an error up front if a non-streaming request is expected to exceed roughly 10 minutes, unless you pass stream: true or override the timeout yourself.
The API itself can time out. The errors documentation lists a 504 timeout_error: "The request timed out while processing. Consider using streaming for long-running requests." Raising your client timeout does nothing about this one.
Networks drop idle connections. Anthropic's long-request guidance notes that some networks drop idle connections "after a variable period of time," causing requests to fail without ever receiving a response. The SDKs set a TCP socket keep-alive option to reduce this, but the docs are explicit that streaming or the Message Batches API is the real answer for anything that might run past 10 minutes.
A Fable 5 request at effort: "high" doing context gathering, building, and self-verification sits in the danger zone for all three. If you have not tuned effort levels yet, that is the first dial - lower settings still perform well and shorten turns considerably.
The minimum viable fix is making sure your client does not give up before the API does. In the TypeScript SDK:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
timeout: 30 * 60 * 1000, // 30 minutes; also settable per-request
});
On timeout the SDK throws APIConnectionTimeoutError, and timed-out requests are retried automatically under the default policy (maxRetries: 2). That retry is a double-edged sword on expensive Fable 5 calls: a request that times out client-side may still be running server-side, and the retry starts a second one. Set maxRetries: 0 on routes where duplicate multi-minute runs would hurt.
The honest assessment: timeout configuration is the weakest layer. It keeps your client from giving up early, but does nothing about the 504 path or idle-connection drops.
Streaming is Anthropic's recommended pattern for long requests, and not just for UX reasons. A streaming response delivers bytes continuously, which keeps the connection from ever looking idle to the network in between. The event stream also includes dedicated ping events dispersed throughout the response, so even during long silent stretches - thinking, tool planning - traffic keeps flowing.
If you do not actually need incremental tokens, you can stream under the hood and still get a complete Message object back:
const stream = client.messages.stream({
model: "claude-fable-5",
max_tokens: 128000,
messages,
});
const message = await stream.finalMessage();
The SDKs require this pattern for large max_tokens values anyway - a non-streaming request at Fable 5's 128K output ceiling will be rejected by the SDK's 10-minute validation before it is ever sent.
Two caveats to handle in code. First, errors can arrive mid-stream, after the API has already returned a 200: the docs show an overloaded_error arriving as an SSE error event, where it would have been an HTTP 529 in a non-streaming context. Your stream consumer needs an error branch. Second, stream recovery changed shape: the prefill-style resume (placing the partial response in an assistant message) applies to "Claude 4.5 models and earlier" per the streaming docs. On Claude 4.6 and later - Fable 5 included - you instead put the captured partial in a user message that instructs the model to continue from where it left off, and tool-use and thinking blocks cannot be partially recovered. On tool-heavy agentic streams that often means a clean retry anyway, which is another argument for checkpointing progress externally on long runs.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
When nobody is waiting on the response, stop holding a connection open at all. The Message Batches API flips the model: submit up to 100,000 requests (or 256 MB) in one batch, then poll processing_status until it reads ended and fetch results. Most batches finish within an hour; results are available once everything completes or after 24 hours, whichever comes first, and remain downloadable for 29 days.
The kicker for Fable 5 specifically: batch processing cuts token costs by 50%. The batch pricing table lists Claude Fable 5 at $5 per million input tokens and $25 per million output - batched Fable 5 costs the same as interactive Opus 4.8. All active models support the Batches API, and nearly all Messages API features work inside a batch (streaming does not). If part of your concern is the 2x sticker price, routing bulk work through batches is the biggest lever you have.
The tradeoff is plain: you give up latency guarantees entirely. A batch may finish in minutes, but requests that do not complete within 24 hours expire unbilled. For deeper production patterns, see the Claude Batch API production guide.
For agentic runs that span hours, no single HTTP request - streamed or not - is the right container. Anthropic's guidance for Fable 5 is to restructure harnesses "to check on runs asynchronously, for example through scheduled jobs, rather than blocking." In practice that means three things:
Run the loop in a worker, not a request handler. Each model turn is one streamed API call inside a loop owned by a durable process (a queue worker, a scheduled job, a container). Your user-facing surface reads state from storage; it never awaits the model directly. This is the core argument in why long-running agents need harnesses - the model can now sustain hours of productive work, so the limiting factor is whether your orchestration layer can.
Give the agent a verbatim progress channel. The prompting guide recommends a client-side send_to_user tool for long asynchronous agents: the model calls it with a message, you render the input directly in your UI and return an acknowledgement. Tool inputs are never summarized, so progress updates with specific numbers arrive intact mid-run, without ending the turn.
Ground progress claims in evidence. On long autonomous runs, Anthropic recommends instructing the model to audit every progress claim against an actual tool result from the session - in their testing this nearly eliminated fabricated status reports. If you are letting runs go overnight, the patterns in running Claude Code autonomously for hours translate directly to API harnesses: checkpoints, externalized state, and verification gates.
Long runs also accumulate context fast. The 1M context window in practice covers how far a single session can go before compaction enters the picture.
| Raised timeout only | Streaming | Batches API | Background worker | |
|---|---|---|---|---|
| Max practical duration | ~10 to 60 min | Minutes-long turns | Up to 24 hours | Unbounded (multi-turn) |
| Protects against idle drops | No | Yes (pings + continuous bytes) | N/A (no held connection) | Yes (per-turn streams) |
| Latency | Blocking | Incremental | None guaranteed | Asynchronous |
| Cost | Standard | Standard | 50% off ($5/$25 on Fable 5) | Standard |
| Best for | Quick patch | Interactive long turns | Bulk, non-urgent work | Hours-long agent runs |
By persona:
display: "summarized" on thinking so users see progress instead of a long pause, and handle mid-stream error events.send_to_user tool, and externalized checkpoints. Budget for retries instead of resumes.When to skip all of this: if your requests are short, interactive, and run at low or medium effort with modest max_tokens, the SDK defaults already cover you. A classification endpoint returning 200 tokens does not need a keepalive strategy. This guide matters once turns cross the multi-minute line - which on Fable 5 at high effort is the normal case, not the edge case.
Anthropic's prompting guide says individual Fable 5 requests on hard tasks can run for many minutes at higher effort settings, and autonomous multi-turn runs can extend for hours. The SDK default timeout is 10 minutes, scaling dynamically up to 60 minutes for large non-streaming max_tokens values, and both can be overridden. The API can still return a 504 timeout_error on very long processing, which is why Anthropic recommends streaming for long-running requests.
The official SDKs default to a 10-minute request timeout and throw APIConnectionTimeoutError when it elapses. They also refuse to send a non-streaming request expected to exceed roughly 10 minutes unless you pass stream: true or override the timeout option. Separately, some networks drop idle connections during long silent processing, so switching to streaming is more reliable than raising the timeout.
Anthropic's errors documentation recommends either for requests over 10 minutes. Use streaming when someone is waiting on the response - it keeps the connection alive with continuous events and ping frames. Use the Batches API when results are not time-sensitive: it removes the held connection entirely, processes most batches within an hour with a 24-hour ceiling, and bills at 50% of standard rates, which brings Fable 5 down to $5/$25 per million tokens.
Yes. Server-sent events flow continuously during generation, and the stream includes ping events dispersed throughout the response, which prevents the connection from appearing idle during long thinking stretches. The SDKs also set TCP socket keep-alive where the runtime supports it. Errors can still occur mid-stream after the initial 200, so handle SSE error events explicitly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...
View ToolThe TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolUnified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolBackground monitoring of logs, files, and long-running processes.
Claude CodeInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedRun Bash commands with Ctrl+B and retrieve output by task ID.
Claude CodeAnthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is wha...
How to use Claude Fable 5 across every access path: claude.ai plans through June 22, the Claude API, Amazon Bedrock, Ver...
Twelve documented Claude Fable 5 use patterns - agent orchestration, overnight runs, 1M-context refactors, effort tuning...
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you...
Claude agents vs skills, untangled: agents are workers with their own context window, skills are instructions loaded on...
Claude Code dynamic workflows turn orchestration into a JavaScript script that runs up to 1,000 agents per run - here is...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.