TL;DR
Million-token context, agent teams that coordinate without an orchestrator, and benchmark scores that push the frontier. Opus 4.6 is Anthropic's biggest model drop yet.
Anthropic dropped Claude Opus 4.6 and it's a leap. Not an incremental bump - a leap.
The flagship is now smarter on coding. Thinks more carefully. Plans more deliberately. Sustains agentic tasks for longer. Handles larger codebases without drift. And it has a million tokens of context. That's not a typo.
Let's dig into what matters.
Opus 4.6 wins across most benchmarks, but the story isn't clean. In some categories it's dominant. In others, Opus 4.5 still edges it out. GPT-5.3 (which dropped right after this release) has a few wins too. That's fine. What matters is the pattern.

Agentic terminal coding is a massive jump. This is the real story. If you're using Claude to build software at scale, this model substantially outperforms 4.5, Sonnet, and Gemini 3 Pro. Not marginal. Substantial.
Agentic search is a clean win. Across the board, better than everything else. That matters for RAG pipelines and knowledge-heavy workloads.
Long context retrieval and reasoning are a tier above. Pass a million tokens into this thing and it actually uses them. Opus 4.5 and Sonnet fall back. Context doesn't degrade into noise the way it does with smaller models.
| Benchmark | Opus 4.6 | Opus 4.5 | GPT-5.3 | Gemini 3 Pro |
|---|---|---|---|---|
| Agentic Coding | 92.1% | 93.2% | 89.7% | 86.5% |
| Agentic Terminal Coding | 87.4% | 71.2% | 68.9% | 65.3% |
| Agentic Search | 94.6% | 81.3% | 79.8% | 77.2% |
| Multidisciplinary Reasoning (with tools) | 53.1% | 48.7% | 51.2% | 46.9% |
| Long Context Retrieval | 96.8% | 84.2% | - | 82.1% |

Two API features shipped with this.
Context compaction does what you'd expect - prunes tokens intelligently so you can fit more without wasting input cost. It's not magic, but it works.
Adaptive thinking is more interesting. The model now decides how much thinking effort a task requires. Simple queries get a quick pass. Complex problems get deeper reasoning. You pay for what you use. Smart.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
This is the feature that matters for the next 12 months.
Sub-agents have a constraint: they report back to an orchestrator. Everything threads through the main agent. That's limiting when you're running long-horizon tasks. Token budget gets consumed by state synchronization.
Agent teams flip that. Multiple agents coordinate with each other and with shared resources - todo lists, scratch pads, progress files. No central bottleneck. The orchestrator stays clean. Context stays coherent.

You can tab through teammates in real time. Inject instructions. Observe progress. Shift between them like separate Claude Code sessions. Because they are, technically.
The cost scales. You're running multiple sessions. But if you're on the Max tier (which anyone serious about agents should be), it's worth it.
Anthropic published a case study. A team of Claude agents built a C compiler. From scratch. 100,000 lines. Compiles Linux 6.9. Can play Doom.
Cost: $20,000. Time: 2,000+ Claude Code sessions.
The approach matters more than the result.
Write extremely high-quality tests. Let Claude validate its own work. This is how you keep quality from degrading across hundreds of sessions.
Offload context to external files. Progress notes. Readme files. Architecture docs. Let the agent reference them instead of keeping everything in the conversation thread.
Inject time awareness. LLMs are time-blind. A task that takes a week feels instant. Anthropic sampled real time at random intervals so the model understood pacing and deadline pressure.
Parallelize by role. Backend engineer. Frontend engineer. Team lead. Each role tackles a different scope. No stepping on toes.
This is the template. You can apply it to codebases, data pipelines, research tasks, anything long-horizon.
Input: $5 per million tokens. Output: $25 per million tokens.
That changes above 200k tokens. Then it gets expensive. If you're using the full million-token context and generating high-volume output, you need to budget for it.
Opus 4.6 is still in beta on the million-token context. Rollout is coming. Costs may shift.
Be honest about the gaps.
Opus 4.5 still wins on some pure knowledge tasks. GPT-5.3 outperforms on a few benchmarks that Anthropic didn't lead on. That's expected. There's no single best model anymore. You pick the right tool for the job.
For agentic work at scale, reasoning with massive context, and long-horizon coding tasks, Opus 4.6 is the frontier.
settings.json. Start with a small task. Get the shape of coordination right before scaling up.Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View Tool
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Anthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...
Configure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsInstall Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting StartedWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI Agents
In this video, we dive into Anthrop's latest release, Claude Opus 4.5, touted as the best model for coding agents and computer use. We review the blog post and significant announcements, such...

Exploring Claude Opus 4.6: Features, Benchmarks, Anthropic's Latest Frontier Model In this video, I delve into the details of Claude Opus 4.6, highlighting key features and performance benchmarks. Th

In this video, we dive into Anthropic's newly launched Cowork, a user-friendly extension of Claude Code designed to streamline work for both developers and non-developers. This discussion includes an

Anthropic has released Claude Opus 4.5, positioning it as their most capable model yet for coding agents and computer us...

Anthropic's Sonnet 4.6 narrows the gap to Opus on agentic tasks, leads computer use benchmarks, and ships with a beta mi...

Two platforms, two philosophies. Here is how Anthropic and OpenAI compare on APIs, SDKs, documentation, pricing, and the...