TL;DR
Task budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.
Read next
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
7 min readAnthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.
8 min readUltracode is two documented things: a prompt keyword that turns one task into a dynamic workflow, and an /effort setting that pairs xhigh reasoning with automatic orchestration. Here is exactly what the docs say.
8 min readLast updated: June 11, 2026
Most cost controls for LLM agents are reactive. max_tokens cuts the response off after the spend has happened. The Usage API tells you about the damage roughly five minutes after the fact. Billing alerts tell you the next morning. Task budgets, a beta feature Anthropic supports on Claude Fable 5 from launch day, are the first control that works the other way around: you tell the model how many tokens it has for the whole job, and the model sees a running countdown and paces itself.
That distinction matters at Fable 5 prices. The model bills $10 per million input tokens and $50 per million output tokens, double Opus 4.8 on every token type, and the new tokenizer used by Opus 4.7 and later can produce up to 35% more tokens for the same text. An agent loop quietly running all night at those rates is exactly the scenario behind the $400 overnight bill. Task budgets will not make that impossible - they are advisory, a caveat we will get to - but they change the failure mode from "truncated mid-action" to "wrapped up gracefully with a partial report."
Per Anthropic's task budgets documentation, a task budget is a token allowance for a full agentic loop: thinking, tool calls, tool results, and final output, potentially spanning many API requests. The server injects a budget-countdown marker into the conversation that only the model can see. As Claude generates thinking and tool calls, and as it processes tool results, the countdown drops, and the model uses that signal to prioritize work and finish cleanly as the budget runs out.
Three properties define the feature:
max_tokens, which truncates with stop_reason: "max_tokens".max_tokens caps one response. A task budget covers everything the model processes across the whole multi-turn tool loop. The two values are independent; neither has to be smaller than the other.The beta is currently supported on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7. It is not supported on Opus 4.6, Sonnet 4.6, or Haiku 4.5, and notably it is not available in Claude Code or Cowork - this is a Messages API feature only. The Fable 5 launch documentation lists task budgets among the features supported on day one.
You opt in with the task-budgets-2026-03-13 beta header and add a task_budget object to output_config. This is the documented shape, adapted to Fable 5 (the docs' own example uses Opus 4.8; the request shape is identical):
import anthropic
client = anthropic.Anthropic()
with client.beta.messages.stream(
model="claude-fable-5",
max_tokens=128000,
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 64000},
},
messages=[
{"role": "user", "content": "Review the codebase and propose a refactor plan."}
],
betas=["task-budgets-2026-03-13"],
) as stream:
response = stream.get_final_message()
print(response.usage)
The task_budget object takes three fields:
type: always "tokens".total: the token allowance for the loop. Minimum 20,000 - lower values return a 400 error.remaining (optional): a carried-over remainder, defaulting to total when omitted. More on when to use this below.On Fable 5 you do not pass a thinking parameter at all - adaptive thinking is always on - so effort plus task_budget inside output_config is the complete spend-shaping surface for a request.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 8 min read
This is the part most likely to trip you up. The budget counts what Claude sees in the current loop, not what your client transmits. In a normal agentic loop you resend the full conversation history on every request, so your payload grows turn over turn - but resent history is not counted against the budget again. Only new content is: the tokens Claude generates this turn, plus the tool results you append.
The docs walk a worked example with a 100,000-token budget on a security-audit loop. Turn one costs about 5,000 tokens of thinking plus a tool call. Turn two adds a 2,800-token tool result and 4,000 tokens of generation. Turn three adds a 1,200-token result and a 6,000-token final report. Total counted against the budget: 19,000 tokens, even though the client transmitted over 20,000 tokens of cumulative payload across the three requests.
Two practical consequences:
remaining on each follow-up while also resending full history, the model sees an under-reported budget and wraps up earlier than it should. The docs' guidance: set a generous budget once and let the server track it.remaining value does - changing it on every request invalidates any cache prefix that contains it. If you have tuned your loop using Fable 5's prompt caching economics, a mutated budget field can quietly erase those savings.The one sanctioned use of remaining is compaction. If your loop summarizes or rewrites earlier context between requests, the server loses track of pre-compaction spend, so you pass remaining: total - tokens_spent_so_far on the next request to keep the countdown honest.
The documentation warns that a budget clearly insufficient for the task can cause refusal-like behavior. Give Fable 5 a 20,000-token budget for a multi-hour coding task and it may decline to start, scope the work down aggressively, or stop early with a partial result rather than begin something it cannot finish. If you see unexpected refusals or premature stops after adding a budget, raise the budget before debugging anything else.
The sizing method Anthropic recommends is empirical: run a representative sample of tasks without a budget, sum usage.output_tokens plus tool-result tokens across every request in each loop, and start from the p99 of that distribution. Then tune down and re-test. This pairs naturally with the measurement harness you probably already built if you have done production cost modeling for Fable 5 - the same per-task token distribution drives both exercises.
Task budgets are one layer of four, and they only work as designed when the other three are set deliberately.
Still the only enforced limit. The docs recommend combining both: task_budget as the target the model paces against, max_tokens as the ceiling that prevents runaway generation on any single request. At xhigh or max effort, set max_tokens to at least 64,000 so the model has room to think and act per request.
The effort parameter (low through max, default high, no beta header) controls how thoroughly Claude reasons about each step - fewer tool calls, less preamble, terser output at lower levels. The docs frame the relationship cleanly: effort tunes depth, task budgets tune breadth. On Fable 5 specifically, Anthropic notes lower effort settings still perform well and often exceed the xhigh performance of prior models, which makes medium a legitimate cost lever rather than a quality cliff. We covered the levels in detail in Fable 5 effort levels explained.
The new layer. Worth reaching for when a task has a predictable cost or latency ceiling, and when you would rather get a graceful summary at the limit than a truncation. Adaptive thinking tokens count against it, so thinking naturally scales down as the budget depletes.
Budgets shape spend before it happens; the Usage and Cost Admin API verifies it afterward. GET /v1/organizations/usage_report/messages returns token usage bucketed at 1m, 1h, or 1d granularity, groupable by model, workspace, API key, and service tier; GET /v1/organizations/cost_report returns USD costs at daily granularity. Both require an Admin API key (the sk-ant-admin... kind), and data typically lands within five minutes of request completion, with polling supported at once per minute. If you run parallel agent fleets, this is the layer that catches what per-task budgets cannot - we walked that math in what parallel Claude agents actually cost.
One honest footnote: the task budgets feature is flagged ZDR-eligible in the docs, but Fable 5 itself requires 30-day data retention, so that distinction only matters on Opus 4.8 or 4.7 under a zero-data-retention arrangement.
A reasonable starting point for a Fable 5 agent loop with bounded spend:
output_config = {
"effort": "high", # default; drop to medium for routine work
"task_budget": {"type": "tokens", "total": 200000}, # sized from your p99
}
request_args = dict(
max_tokens=64000, # per-request backstop; stream anything this large
betas=["task-budgets-2026-03-13"],
)
Set the budget once, do not mutate remaining unless you compact, keep max_tokens as the per-request backstop, and reconcile weekly against the Cost API. For deciding whether the workload should be on Fable 5 at all at $10/$50, the cost-per-task analysis is the place to start.
Task budgets are in beta on Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, and Claude Opus 4.7, enabled with the task-budgets-2026-03-13 beta header. Opus 4.6, Sonnet 4.6, and Haiku 4.5 do not support the feature, and it is not available in Claude Code or Cowork - only via the Messages API directly.
No. Anthropic's docs describe it as a soft hint: the model sees the countdown and self-regulates, but it may exceed the budget to finish an action that would be disruptive to interrupt. The enforced limit is still max_tokens per request. For a true cost ceiling, use both together and monitor with the Usage and Cost API.
20,000 tokens. Values below the minimum return a 400 error. Be careful near the floor: a budget that is obviously too small for the task can make the model decline the work, scope it down, or stop early with a partial result.
max_tokens is a hard cap on generated tokens for one request, and the model is not aware of it. task_budget is an advisory allowance across an entire agentic loop - thinking, tool calls, tool results, and output over potentially many requests - and the model actively sees the remaining balance. The two are independent and complementary.
Yes. The budget counts everything Claude processes in the current loop: generated thinking, tool calls, text output, and the tool results your client appends. Resent conversation history is not counted twice.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Factory AI's terminal coding agent. Runs Anthropic and OpenAI models in one subscription. Handles full tasks end-to-end...
View ToolSelf-healing browser automation harness that lets LLMs complete any browser task. 5,000+ stars in under a week.
View ToolAnthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolKnow what each agent run cost before the bill arrives. Budgets and alerts included.
View AppScore every coding agent on your own tasks. Catch regressions in CI.
View AppCompare AI coding agents on reproducible tasks with scored, shareable runs.
View AppCoordinate multiple Claude Code instances with a shared task list.
Claude CodeCoordinator agent that assigns tasks and synthesizes findings.
Claude CodeRequire lead approval before teammates execute their tasks.
Claude CodeFable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you...
Claude agents vs skills, untangled: agents are workers with their own context window, skills are instructions loaded on...
Claude Code dynamic workflows turn orchestration into a JavaScript script that runs up to 1,000 agents per run - here is...
Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is wha...
Anthropic says persistent file-based memory improved Fable 5 three times more than it improved Opus 4.8. Here is the ful...
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost ma...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.