Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Q: How is task_budget different from max_tokens?

`max_tokens` is a hard cap on generated tokens for one request, and the model is not aware of it. `task_budget` is an advisory allowance across an entire agentic loop - thinking, tool calls, tool results, and output over potentially many requests - and the model actively sees the remaining balance. The two are independent and complementary.

Last updated: June 11, 2026

Most cost controls for LLM agents are reactive. max_tokens cuts the response off after the spend has happened. The Usage API tells you about the damage roughly five minutes after the fact. Billing alerts tell you the next morning. Task budgets, a beta feature Anthropic supports on Claude Fable 5 from launch day, are the first control that works the other way around: you tell the model how many tokens it has for the whole job, and the model sees a running countdown and paces itself.

That distinction matters at Fable 5 prices. The model bills $10 per million input tokens and $50 per million output tokens, double Opus 4.8 on every token type, and the new tokenizer used by Opus 4.7 and later can produce up to 35% more tokens for the same text. An agent loop quietly running all night at those rates is exactly the scenario behind the $400 overnight bill. Task budgets will not make that impossible - they are advisory, a caveat we will get to - but they change the failure mode from "truncated mid-action" to "wrapped up gracefully with a partial report."

What a Task Budget Actually Is

Per Anthropic's task budgets documentation, a task budget is a token allowance for a full agentic loop: thinking, tool calls, tool results, and final output, potentially spanning many API requests. The server injects a budget-countdown marker into the conversation that only the model can see. As Claude generates thinking and tool calls, and as it processes tool results, the countdown drops, and the model uses that signal to prioritize work and finish cleanly as the budget runs out.

Three properties define the feature:

It is advisory, not enforced. The docs are explicit that a task budget is "a soft hint, not a hard cap." Claude may exceed the budget if interrupting an in-flight action would be more disruptive than finishing it. The enforced limit remains max_tokens, which truncates with stop_reason: "max_tokens".
It spans the loop, not the request. max_tokens caps one response. A task budget covers everything the model processes across the whole multi-turn tool loop. The two values are independent; neither has to be smaller than the other.
The countdown is invisible to you. API responses carry no remaining-budget field, and the SDKs have no accessor for it. If you want client-side tracking, you sum usage across requests yourself.

The beta is currently supported on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7. It is not supported on Opus 4.6, Sonnet 4.6, or Haiku 4.5, and notably it is not available in Claude Code or Cowork - this is a Messages API feature only. The Fable 5 launch documentation lists task budgets among the features supported on day one.

Setting One Up

You opt in with the task-budgets-2026-03-13 beta header and add a task_budget object to output_config. This is the documented shape, adapted to Fable 5 (the docs' own example uses Opus 4.8; the request shape is identical):

import anthropic

client = anthropic.Anthropic()

with client.beta.messages.stream(
    model="claude-fable-5",
    max_tokens=128000,
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 64000},
    },
    messages=[
        {"role": "user", "content": "Review the codebase and propose a refactor plan."}
    ],
    betas=["task-budgets-2026-03-13"],
) as stream:
    response = stream.get_final_message()

print(response.usage)

The task_budget object takes three fields:

type: always "tokens".
total: the token allowance for the loop. Minimum 20,000 - lower values return a 400 error.
remaining (optional): a carried-over remainder, defaulting to total when omitted. More on when to use this below.

On Fable 5 you do not pass a thinking parameter at all - adaptive thinking is always on - so effort plus task_budget inside output_config is the complete spend-shaping surface for a request.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Frontier Model API Pricing, June 2026: Claude vs OpenAI vs Gemini vs DeepSeek

Jun 11, 2026 • 10 min read

The Frontier Model Landscape, June 2026 Edition

Jun 11, 2026 • 10 min read

The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

Jun 11, 2026 • 8 min read

GPT-5.5 vs Claude Opus 4.8: The $5 Workhorse Head-to-Head

Jun 11, 2026 • 8 min read

How the Countdown Counts Tokens

This is the part most likely to trip you up. The budget counts what Claude sees in the current loop, not what your client transmits. In a normal agentic loop you resend the full conversation history on every request, so your payload grows turn over turn - but resent history is not counted against the budget again. Only new content is: the tokens Claude generates this turn, plus the tool results you append.

The docs walk a worked example with a 100,000-token budget on a security-audit loop. Turn one costs about 5,000 tokens of thinking plus a tool call. Turn two adds a 2,800-token tool result and 4,000 tokens of generation. Turn three adds a 1,200-token result and a 6,000-token final report. Total counted against the budget: 19,000 tokens, even though the client transmitted over 20,000 tokens of cumulative payload across the three requests.

Two practical consequences:

Do not try to mirror the countdown client-side and feed it back. If you decrement remaining on each follow-up while also resending full history, the model sees an under-reported budget and wraps up earlier than it should. The docs' guidance: set a generous budget once and let the server track it.
There is a caching interaction. The countdown marker is injected server-side per turn and does not participate in your cache prefix. But a client-mutated remaining value does - changing it on every request invalidates any cache prefix that contains it. If you have tuned your loop using Fable 5's prompt caching economics, a mutated budget field can quietly erase those savings.

The one sanctioned use of remaining is compaction. If your loop summarizes or rewrites earlier context between requests, the server loses track of pre-compaction spend, so you pass remaining: total - tokens_spent_so_far on the next request to keep the countdown honest.

The Gotcha: Budgets That Are Too Small

The documentation warns that a budget clearly insufficient for the task can cause refusal-like behavior. Give Fable 5 a 20,000-token budget for a multi-hour coding task and it may decline to start, scope the work down aggressively, or stop early with a partial result rather than begin something it cannot finish. If you see unexpected refusals or premature stops after adding a budget, raise the budget before debugging anything else.

The sizing method Anthropic recommends is empirical: run a representative sample of tasks without a budget, sum usage.output_tokens plus tool-result tokens across every request in each loop, and start from the p99 of that distribution. Then tune down and re-test. This pairs naturally with the measurement harness you probably already built if you have done production cost modeling for Fable 5 - the same per-task token distribution drives both exercises.

Where Task Budgets Fit in the Cost-Control Stack

Task budgets are one layer of four, and they only work as designed when the other three are set deliberately.

max_tokens: the hard ceiling

Still the only enforced limit. The docs recommend combining both: task_budget as the target the model paces against, max_tokens as the ceiling that prevents runaway generation on any single request. At xhigh or max effort, set max_tokens to at least 64,000 so the model has room to think and act per request.

Effort: depth per step

The effort parameter (low through max, default high, no beta header) controls how thoroughly Claude reasons about each step - fewer tool calls, less preamble, terser output at lower levels. The docs frame the relationship cleanly: effort tunes depth, task budgets tune breadth. On Fable 5 specifically, Anthropic notes lower effort settings still perform well and often exceed the xhigh performance of prior models, which makes medium a legitimate cost lever rather than a quality cliff. We covered the levels in detail in Fable 5 effort levels explained.

Task budget: breadth per loop

The new layer. Worth reaching for when a task has a predictable cost or latency ceiling, and when you would rather get a graceful summary at the limit than a truncation. Adaptive thinking tokens count against it, so thinking naturally scales down as the budget depletes.

The Usage and Cost API: the audit trail

Budgets shape spend before it happens; the Usage and Cost Admin API verifies it afterward. GET /v1/organizations/usage_report/messages returns token usage bucketed at 1m, 1h, or 1d granularity, groupable by model, workspace, API key, and service tier; GET /v1/organizations/cost_report returns USD costs at daily granularity. Both require an Admin API key (the sk-ant-admin... kind), and data typically lands within five minutes of request completion, with polling supported at once per minute. If you run parallel agent fleets, this is the layer that catches what per-task budgets cannot - we walked that math in what parallel Claude agents actually cost.

One honest footnote: the task budgets feature is flagged ZDR-eligible in the docs, but Fable 5 itself requires 30-day data retention, so that distinction only matters on Opus 4.8 or 4.7 under a zero-data-retention arrangement.

A Sensible Default Configuration

A reasonable starting point for a Fable 5 agent loop with bounded spend:

output_config = {
    "effort": "high",  # default; drop to medium for routine work
    "task_budget": {"type": "tokens", "total": 200000},  # sized from your p99
}
request_args = dict(
    max_tokens=64000,  # per-request backstop; stream anything this large
    betas=["task-budgets-2026-03-13"],
)

Set the budget once, do not mutate remaining unless you compact, keep max_tokens as the per-request backstop, and reconcile weekly against the Cost API. For deciding whether the workload should be on Fable 5 at all at $10/$50, the cost-per-task analysis is the place to start.

FAQ

What models support task budgets?

Task budgets are in beta on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7, enabled with the task-budgets-2026-03-13 beta header. Opus 4.6, Sonnet 4.6, and Haiku 4.5 do not support the feature, and it is not available in Claude Code or Cowork - only via the Messages API directly.

Is a task budget a hard limit on spend?

No. Anthropic's docs describe it as a soft hint: the model sees the countdown and self-regulates, but it may exceed the budget to finish an action that would be disruptive to interrupt. The enforced limit is still max_tokens per request. For a true cost ceiling, use both together and monitor with the Usage and Cost API.

What is the minimum task budget?

20,000 tokens. Values below the minimum return a 400 error. Be careful near the floor: a budget that is obviously too small for the task can make the model decline the work, scope it down, or stop early with a partial result.

How is task_budget different from max_tokens?

max_tokens is a hard cap on generated tokens for one request, and the model is not aware of it. task_budget is an advisory allowance across an entire agentic loop - thinking, tool calls, tool results, and output over potentially many requests - and the model actively sees the remaining balance. The two are independent and complementary.

Do tool results count against the task budget?

Yes. The budget counts everything Claude processes in the current loop: generated thinking, tool calls, text output, and the tool results your client appends. Resent conversation history is not counted twice.

Sources

Task budgets (beta) - Anthropic docs (accessed June 11, 2026)
Introducing Claude Fable 5 and Claude Mythos 5 - Anthropic docs (accessed June 11, 2026)
Effort - Anthropic docs (accessed June 11, 2026)
Usage and Cost API - Anthropic docs (accessed June 11, 2026)
Pricing - Anthropic docs (accessed June 11, 2026)

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Ultracode: Claude Code Multi-Agent Orchestration Mode Explained

What a Task Budget Actually Is

Setting One Up

Frontier Model API Pricing, June 2026: Claude vs OpenAI vs Gemini vs DeepSeek

The Frontier Model Landscape, June 2026 Edition

The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

GPT-5.5 vs Claude Opus 4.8: The $5 Workhorse Head-to-Head

How the Countdown Counts Tokens

The Gotcha: Budgets That Are Too Small

Where Task Budgets Fit in the Cost-Control Stack

max_tokens: the hard ceiling

Effort: depth per step

Task budget: breadth per loop

The Usage and Cost API: the audit trail

A Sensible Default Configuration

FAQ

What models support task budgets?

Is a task budget a hard limit on spend?

What is the minimum task budget?

How is task_budget different from max_tokens?

Do tool results count against the task budget?

Sources

Related Tools

Droid

Browser Harness

Claude Fable 5

Claude Code

Apps from Developers Digest

Cost Tape Cloud

Agent Eval Bench Plus

Agent Benchmark Lab

Related Guides

Agent Teams - Claude Code

Team Lead - Claude Code

Plan Approval - Claude Code

Related Posts

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Claude Agents vs Skills: Which One Do You Actually Need?

Claude Code Dynamic Workflows: The Complete Guide

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

Get Smarter About AI Dev

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Ultracode: Claude Code Multi-Agent Orchestration Mode Explained

What a Task Budget Actually Is

Setting One Up

Frontier Model API Pricing, June 2026: Claude vs OpenAI vs Gemini vs DeepSeek

The Frontier Model Landscape, June 2026 Edition

The Mid-Tier Shootout: GPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4 Pro

GPT-5.5 vs Claude Opus 4.8: The $5 Workhorse Head-to-Head

How the Countdown Counts Tokens

The Gotcha: Budgets That Are Too Small

Where Task Budgets Fit in the Cost-Control Stack

max_tokens: the hard ceiling

Effort: depth per step

Task budget: breadth per loop

The Usage and Cost API: the audit trail

A Sensible Default Configuration

FAQ

What models support task budgets?

Is a task budget a hard limit on spend?

What is the minimum task budget?

How is task_budget different from max_tokens?

Do tool results count against the task budget?

Sources

Related Tools

Droid

Browser Harness

Claude Fable 5

Claude Code

Apps from Developers Digest

Cost Tape Cloud

Agent Eval Bench Plus

Agent Benchmark Lab

Related Guides

Agent Teams - Claude Code