Handling Fable 5 Refusals: A Working Guide to the Fallback API

Developers Digest•June 10, 2026•10 min read

The Fable 5 Moment

30 parts

Previous in seriesClaude Fable 5 API: Production Integration Patterns, Rate Limits, and Migration Gotchas

Next in seriesWhy Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

TL;DR

Fable 5 ships with safety classifiers that route flagged requests away from the model. In production you need to handle this, and Anthropic shipped three ways to do it. Here's how each one works, with code, plus the billing rules nobody has written up.

Migrating to Claude Fable 5: The Practical Guide

Fable 5 is mostly a drop-in replacement for Opus 4.8, but 'mostly' is doing real work in that sentence. Here's every breaking change, what to delete from your code, and the prompt audit you should run before flipping the model ID.

9 min read

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Anthropic gave subscribers two weeks of free Fable 5 access, then it moves to usage credits. Here's what's actually changing, what the real-world burn rates look like, and what to do depending on how you use Claude.

6 min read

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Claude Fable 5 routes blocked queries to Opus 4.8 rather than refusing outright - but the fallback is not automatic for API users and requires explicit configuration. Here is the complete developer guide to the refusal architecture.

8 min read

Claude Fable 5 is the same model as the restricted-access Mythos 5, wrapped in safety classifiers. When a classifier fires, Fable 5 doesn't answer. In the Claude apps, the request silently falls back to Opus 4.8. On the API, handling that is your job.

This matters even if your product has nothing to do with security or biology. The classifiers are tuned conservative and they catch normal work: developers reported a base64 implementation flagged as cyber, genome-alignment pipelines rerouted, and prompts asking the model to "explain its reasoning" tripping the extraction filter. Anthropic's own number is that under 5% of sessions hit a fallback. Across production traffic, that's not an edge case. That's Tuesday.

Here's how refusals work on the wire, and the three ways to handle them.

What a refusal looks like

A refusal is not an error. You get HTTP 200 with a new stop reason:

{
  "id": "msg_...",
  "model": "claude-fable-5",
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "This request was flagged by safety classifiers..."
  },
  "content": [...],
  "usage": {...}
}

Three things to know:

stop_details.category is "cyber", "bio", "reasoning_extraction", or null.
A refusal can happen before any output or mid-stream. If it fires before output, you are not billed and it doesn't count against rate limits. Mid-stream, you pay for input plus whatever already streamed.
If your code only checks for HTTP errors, refusals will surface as mysteriously short responses. Check stop_reason on every call.

Option 1: Manual retry on Opus 4.8

The simplest approach. Catch the refusal, replay the conversation on claude-opus-4-8:

def create_with_fallback(client, **kwargs):
    response = client.messages.create(model="claude-fable-5", **kwargs)

    if response.stop_reason == "refusal":
        # Strip thinking blocks before cross-model replay
        clean_messages = strip_thinking_blocks(kwargs["messages"])
        response = client.messages.create(
            model="claude-opus-4-8",
            **{**kwargs, "messages": clean_messages},
        )

    return response

The gotcha is in that strip_thinking_blocks call. Thinking blocks are model-specific: you pass them back unchanged on same-model multi-turn conversations, but you must strip thinking and redacted_thinking blocks when replaying history on a different model.

The bigger problem with manual retry is cost. Prompt caches are per-model. If you've built up a large cached prefix on Fable 5 and a refusal forces you to Opus 4.8, you pay full cache-write costs all over again on the new model. On a long agentic session with hundreds of thousands of cached tokens, that's real money. Anthropic shipped a fix for this, covered below.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Jun 10, 2026 • 8 min read

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Jun 10, 2026 • 8 min read

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Jun 10, 2026 • 9 min read

Codex in June 2026: What Changed Since the Spring Wave

Jun 10, 2026 • 9 min read

Option 2: Server-side fallback (the good one)

New with Fable 5, in beta. You declare fallback models in the request and Anthropic handles the reroute server-side:

response = client.messages.create(
    model="claude-fable-5",
    max_tokens=32000,
    messages=messages,
    fallbacks=[{"model": "claude-opus-4-8"}],
    extra_headers={"anthropic-beta": "server-side-fallback-2026-06-01"},
)

Details that matter:

The beta header must be exactly server-side-fallback-2026-06-01. Other date values return a 400.
Up to 3 fallback models, tried in order. Each entry can override max_tokens and thinking for that attempt only, which you'll want, because Opus 4.8 has different thinking semantics than Fable 5.
Only a safety-classifier decline triggers the fallback. Rate limits, overload, and 5xx errors do not. This is not a general resilience mechanism; pair it with your normal retry logic.
Permitted fallback targets are published per-model as allowed_fallback_models on the Models API when you set the beta header.

The response tells you exactly what happened. The top-level model field reports whichever model actually answered, a new fallback content block marks the handoff:

{"type": "fallback", "from": {"model": "claude-fable-5"}, "to": {"model": "claude-opus-4-8"}}

and usage.iterations[] breaks out token usage per attempted model, which is what you want for cost attribution.

Availability caveat: server-side fallback works on the Claude API and Claude Platform on AWS only. It's rejected on the Batches API, and it doesn't exist on Bedrock, Vertex, or Microsoft Foundry. On those platforms you need option 3.

Option 3: SDK middleware

Anthropic shipped refusal-fallback middleware for the TypeScript, Python, Go, Java, and C# SDKs alongside the launch. It implements the catch-refusal-retry-and-bill-correctly loop client-side:

import Anthropic from "@anthropic-ai/sdk";
import { refusalFallback } from "@anthropic-ai/sdk/middleware";

const client = new Anthropic({
  middleware: [refusalFallback({ fallbackModel: "claude-opus-4-8" })],
});

This is the right answer on Bedrock and Vertex where server-side fallback isn't available, and for teams who want the behavior without taking a beta header into production. It also applies the fallback credit automatically, which brings us to billing.

The billing rules

Scattered across three docs pages, collected here:

Refusal before any output: free. No billing, no rate-limit hit.
Mid-stream refusal: you pay for input plus already-streamed output at Fable 5 rates.
The fallback answer is billed at the fallback model's rates. Anthropic states you won't be charged Fable prices for rerouted requests. Opus 4.8 answers cost Opus 4.8 prices ($5/$25 per MTok instead of $10/$50).
Cache re-write costs are refundable via the fallback-credit beta.

That last one deserves explanation. When a refusal forces you onto a new model, the refusal response carries a one-time token in stop_details:

"stop_details": {
  "type": "refusal",
  "fallback_credit_token": "fct_...",
  "fallback_has_prefill_claim": true
}

Echo it as a top-level fallback_credit_token on your retry (with the fallback-credit-2026-06-01 beta header, which the server-side-fallback header also grants), and the retry is billed as if the conversation had always been on the new model. No double-paying for cache writes.

You only need to touch this if you're hand-rolling retries in Ruby, PHP, or raw HTTP. Server-side fallback and the SDK middleware both apply it for you.

Which option to use

New project on the Claude API: server-side fallback. Least code, correct billing, per-model usage breakdown.
Bedrock, Vertex, or Foundry: SDK middleware. Server-side fallback doesn't exist there.
Batch workloads: manual handling. The Batches API rejects the fallbacks parameter, so split refused items into a retry batch against Opus 4.8.
No SDK (raw HTTP): manual retry plus the fallback-credit token.

Reduce refusals before you handle them

The cheapest refusal is the one that never fires. Two prompt-level fixes with outsized impact:

First, remove any instruction asking the model to reproduce its reasoning in responses. "Show your thought process" style prompts trigger the reasoning_extraction classifier. If you need process visibility, set thinking: {"display": "summarized"} and read the thinking blocks.

Second, if you work in security or life sciences legitimately, expect false positives and design for them rather than fighting them. The fallback path to Opus 4.8 is the supported answer. Opus 4.8 has no blocking cyber safeguards and remains a very capable model; for flagged requests, a clean fallback is a better experience than an argument with a classifier.

Log every refusal with its category. A week of data will tell you whether your workload has a 0.1% fallback rate or a 15% one, and that number should drive how much engineering you invest here.

Sources: Anthropic's refusals and fallback, fallback credit, SDK middleware docs, and the fallback billing cookbook.

Suggest an editSave

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

What a refusal looks like

Option 1: Manual retry on Opus 4.8

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Codex in June 2026: What Changed Since the Spring Wave

Option 2: Server-side fallback (the good one)

Option 3: SDK middleware

The billing rules

Which option to use

Reduce refusals before you handle them

Try These Tools

Related Tools

Claude Fable 5

Claude Code

Vercel AI SDK

Claude Agent SDK

Apps from Developers Digest

Skill Builder

Subagent Studio

Related Guides

Claude Code Setup Guide

Building Your First MCP Server

Routines (Web) - Claude Code

Related Videos

Claude Fable 5 in 7 Minutes

Related Posts

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Fable 5 Broke Enterprise ZDR Agreements: What Dev Teams Must Do Now

Fable 5 Before June 22: The Decision Checklist for Every Plan Tier

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Decoding Anthropic's Model Names: Fable, Mythos, and What the Naming Shift Signals

Get Smarter About AI Dev

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

What a refusal looks like

Option 1: Manual retry on Opus 4.8

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Codex in June 2026: What Changed Since the Spring Wave

Option 2: Server-side fallback (the good one)

Option 3: SDK middleware

The billing rules

Which option to use

Reduce refusals before you handle them

Try These Tools

Related Tools

Claude Fable 5

Claude Code

Vercel AI SDK

Claude Agent SDK

Apps from Developers Digest

Skill Builder

Subagent Studio

Related Guides

Claude Code Setup Guide

Building Your First MCP Server

Routines (Web) - Claude Code

Related Videos

Claude Fable 5 in 7 Minutes

Related Posts

Migrating to Claude Fable 5: The Practical Guide

Fable 5 Leaves Your Claude Plan on June 22. Here's How to Plan for It

Fable 5 Broke Enterprise ZDR Agreements: What Dev Teams Must Do Now

Fable 5 Before June 22: The Decision Checklist for Every Plan Tier

Why Fable 5 Refuses Your Cybersecurity Queries (And How the Fallback Works)

Decoding Anthropic's Model Names: Fable, Mythos, and What the Naming Shift Signals

Get Smarter About AI Dev