Claude Code Token Burn Is an Observability Problem

Claude Code token burn is back in the feed.

The current viral thread started with Alexander Zanfir's writeup, Claude Diagnosed Its Own Cache Bug. The useful part is not whether every claim in the timeline is proven from the outside. The useful part is that a coding agent was asked to audit its own usage, found suspicious cache-flush behavior, and produced a trail that other users could argue with.

That is where the AI coding market is headed. Not "trust the quota bar." Not "trust a Reddit screenshot." Agent usage needs repro-grade observability.

If you are already running Claude Code, this belongs next to the Claude Code usage limits playbook, agent FinOps, and the recent Claude Code ops release. The product keeps getting more capable. The accounting layer has to catch up.

What actually changed

Anthropic did publish an official postmortem on April 23: An update on recent Claude Code quality reports. It traced the recent quality issues to three separate changes:

a reasoning-effort default change that was later reverted
a stale-session thinking-cache bug that caused repeated cache misses
a system prompt change that hurt coding quality

The cache section matters most for token burn. Anthropic says the bug caused old thinking to be cleared every turn after a stale session crossed an idle threshold. That made Claude seem forgetful and repetitive, and Anthropic wrote that it likely drove reports of usage limits draining faster than expected.

So the simplified take, "Anthropic never acknowledged anything," is wrong. Zanfir's article now includes a correction on that point.

But the opposing simplified take, "the postmortem means this is over," is also too neat. Users are still reporting confusing usage behavior, the community is still building monitors and workarounds, and Anthropic's own support docs still explain usage in broad plan-level terms rather than session-level cache health.

The lesson is not that every complaint is a confirmed bug. The lesson is that coding-agent usage needs better local evidence.

Cache misses are a product issue

Prompt caching is usually explained as infrastructure. It should be treated as product behavior.

When a coding agent is working in a large repo, the difference between a healthy cache and a broken cache can be the difference between a useful Max session and a five-hour reset that arrives before the patch is done. Anthropic's usage-limit docs say usage depends on conversation length, model, features, and product surface. Their cost-management docs also point API users toward historical usage and workspace spend limits.

That is useful, but it is not enough for serious agent work.

A developer running a long Claude Code session needs to know:

how many input tokens were cached versus uncached
whether cache reads collapsed after resume
whether thinking blocks are being retained or pruned
whether MCP calls, subagents, or skills changed the prompt prefix
which turn caused a quota cliff
whether the next request is likely to rebuild the whole context

That is not billing trivia. It changes whether you continue the session, compact, restart, split the task, switch models, or stop and file a bug.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Codex Automations: Where Scheduled AI Agents Actually Help

May 5, 2026 • 9 min read

Codex Is Becoming a General-Purpose AI Agent, Not Just a Coding Tool

May 5, 2026 • 8 min read

Codex Loops: What Boris Cherny Gets Right About Managing Agent Work

May 5, 2026 • 8 min read

Codex SDK vs CLI vs GitHub Action: Which Surface Should You Build On?

May 5, 2026 • 8 min read

The community is building the missing gauges

This is why the most interesting GitHub signal is not another wrapper promising free usage. It is tooling like cc-cache-monitor, which tries to inspect Claude Code logs and surface cache behavior.

Whether that specific project becomes the standard is less important than the pattern. Developers want the agent equivalent of a network waterfall:

turn number
model
input tokens
output tokens
cache reads
cache writes
cache misses
tool calls
estimated cost
session reset events

That is the same argument behind agent receipts. Once agents run for hours, "it felt expensive" is not acceptable debugging data.

The fair critique

There is a fair critique of the community reaction: local reverse engineering can overfit.

Claude Code is a hosted product, a local CLI, an API client, a model harness, a prompt layer, and a quota system at the same time. A user can observe symptoms, logs, and billing effects, but not every server-side decision. Cache behavior can change because of TTLs, model routing, product experiments, stale sessions, prompt changes, or user configuration.

That means public bug claims should be written with care.

But that is exactly why first-party observability matters. When the official product does not expose enough session-level telemetry, the community fills the gap with scripts, screenshots, Reddit threads, and partial reconstructions. Some will be right. Some will be wrong. All of them become louder than they need to be because the product does not provide the obvious facts.

What Claude Code should expose

Claude Code does not need to expose private chain-of-thought or internal prompts to fix this class of problem. It needs operational counters.

Minimum viable usage telemetry:

Counter	Why it matters
`cache_read_tokens`	Shows whether reused context is actually cheap
`cache_write_tokens`	Shows when the session is rebuilding expensive prefixes
`uncached_input_tokens`	Separates real new work from repeated context cost
`output_tokens`	Identifies verbosity and overthinking failures
`thinking_budget`	Shows whether effort settings are driving cost
`tool_call_count`	Catches runaway searches, MCP loops, and file rereads
`session_age`	Makes idle-resume behavior visible
`estimated_plan_usage`	Translates technical counters into quota impact

Expose it in /usage, export it as JSON, and let hooks read it. That would make Claude Code easier to trust without weakening the product.

For teams, the same shape should become an OpenTelemetry stream. We covered the broader managed-agent FinOps problem, but Claude Code is the cleanest consumer example: the user needs one trace per agent run, with model calls and tool calls under it, tagged with usage counters and cost estimates.

What to do this week

Do not wait for the perfect official dashboard.

Upgrade Claude Code and read the release notes before assuming old workarounds still apply.
Start long tasks in fresh sessions when cache behavior feels suspicious.
Use /compact or split tasks before the context gets huge.
Track session-level cost or quota burn outside the chat transcript.
Add stop hooks that halt repeated failing loops before they become quota loops.
Keep a short repro log: version, model, effort setting, session age, resume behavior, and whether MCP/subagents/skills were active.

The goal is not paranoia. The goal is to make usage complaints debuggable.

The take

The cache-burn controversy is not a reason to abandon Claude Code. It is a reason to operate it like infrastructure.

Claude Code is becoming a serious agent runtime: subagents, hooks, MCP, worktrees, skills, plugins, and long-running loops. Serious runtimes need serious counters. If prompt caching saves quota, developers should be able to see it. If a stale session starts rebuilding context, developers should be able to catch it before the five-hour reset.

The next differentiator in AI coding tools will not just be model quality. It will be whether the tool can explain what it spent.

Sources

Anthropic: An update on recent Claude Code quality reports
Anthropic Help Center: Understanding usage and length limits
Anthropic Docs: Manage costs effectively
Alexander Zanfir: Claude Diagnosed Its Own Cache Bug
GitHub: cc-cache-monitor

Frequently Asked Questions

Why is Claude Code using so much quota?

Claude Code usage depends on model choice, effort setting, conversation length, tool use, attached context, and cache behavior. If a long session repeatedly rebuilds context instead of reading from cache, quota can drain much faster than the visible response length suggests.

Did Anthropic confirm a Claude Code cache bug?

Yes. Anthropic's April 23 postmortem says a stale-session thinking-cache bug caused prior reasoning to be dropped every turn after an idle threshold and likely contributed to reports of usage limits draining faster than expected. Anthropic says that specific issue was fixed on April 10 in v2.1.101.

Does that mean every current token-burn complaint is the same bug?

No. Current reports can come from old client versions, long context, effort settings, MCP behavior, subagents, server-side cache eviction, or unrelated product issues. That is why session-level telemetry matters.

How do I monitor Claude Code cache behavior?

Start by checking Claude Code's built-in usage view and keeping session metadata for suspicious runs. Community tools like cc-cache-monitor are emerging to inspect local logs, but treat them as diagnostic aids rather than official billing truth.

What should Claude Code expose in `/usage`?

At minimum: cached input tokens, uncached input tokens, cache writes, output tokens, thinking budget, tool-call count, session age, model, effort setting, and estimated quota impact per turn.

Should teams stop using long-running Claude Code sessions?

No. Long sessions are still useful for deep coding work. Teams should add iteration caps, stop hooks, fresh-session checkpoints, and usage telemetry so long runs fail visibly instead of quietly burning quota.

Parallel Coding Agents Need Merge Discipline

Claude Code 2.1.128 Is an Ops Release, Not a Feature Drop

Free Claude Code Is Really a Model Gateway Bet

What actually changed

Cache misses are a product issue

Codex Automations: Where Scheduled AI Agents Actually Help

Codex Is Becoming a General-Purpose AI Agent, Not Just a Coding Tool

Codex Loops: What Boris Cherny Gets Right About Managing Agent Work

Codex SDK vs CLI vs GitHub Action: Which Surface Should You Build On?

The community is building the missing gauges

The fair critique

What Claude Code should expose

What to do this week

The take

Sources

Frequently Asked Questions

Why is Claude Code using so much quota?

Did Anthropic confirm a Claude Code cache bug?

Does that mean every current token-burn complaint is the same bug?

How do I monitor Claude Code cache behavior?

What should Claude Code expose in /usage?

Should teams stop using long-running Claude Code sessions?

Comments

Related Tools

Codeburn

Claude Code

Claude Opus 4.7

Zed

Apps from Developers Digest

Agent Hub

Skill Builder

Skills Pro

Related Guides

1M Token Context - Claude Code

Fast Mode - Claude Code

Extended Thinking - Claude Code

Related Videos

Nimbalyst: The Open-Source Visual Workspace for Building with Codex and Claude Code

Composio: Connect OpenClaw & Claude Code to 1,000+ Apps via CLI

Claude Code Channels in 8 Minutes

Related Posts

Claude Code 2.1.128 Is an Ops Release, Not a Feature Drop

Free Claude Code Is Really a Model Gateway Bet

Agent Skills Need Exit Criteria, Not More Prompt Lore

Parallel Coding Agents Need Merge Discipline

Skills Are the New Agent Operating System

Best Claude Code Skills in 2026: A Curated Directory

Get Smarter About AI Dev

Parallel Coding Agents Need Merge Discipline

Claude Code 2.1.128 Is an Ops Release, Not a Feature Drop

Free Claude Code Is Really a Model Gateway Bet

What actually changed

Cache misses are a product issue

Codex Automations: Where Scheduled AI Agents Actually Help

Codex Is Becoming a General-Purpose AI Agent, Not Just a Coding Tool

Codex Loops: What Boris Cherny Gets Right About Managing Agent Work

Codex SDK vs CLI vs GitHub Action: Which Surface Should You Build On?

The community is building the missing gauges

The fair critique

What Claude Code should expose

What to do this week

The take

Sources

Frequently Asked Questions

Why is Claude Code using so much quota?

Did Anthropic confirm a Claude Code cache bug?

Does that mean every current token-burn complaint is the same bug?

How do I monitor Claude Code cache behavior?

What should Claude Code expose in /usage?

Should teams stop using long-running Claude Code sessions?

Comments

Related Tools

Codeburn

Claude Code

Claude Opus 4.7

Zed

Apps from Developers Digest

Agent Hub

Skill Builder

Skills Pro

What should Claude Code expose in `/usage`?

What should Claude Code expose in `/usage`?