
6 min read
Anthropic's Sonnet 4.6 narrows the gap to Opus on agentic tasks, leads computer use benchmarks, and ships with a beta million-token context window. Here's what actually changed.
Read more
2 articles

Same prompt, different models, live comparison. Here is what I learned testing Cursor Composer 2, Kimi, Droid, and MiniMax on 10 real web development tasks.
Looking for tools and guides about Benchmarks too?
View Benchmarks Topic Hub
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 160 topics
Browse All Topics