
Envoy AI Gateway 1.0 is production-ready. The useful question for builders is when an Envoy-based LLM gateway beats direct SDK calls, LiteLLM, OpenRouter, or a hosted AI gateway.
14 articles

A viral Hacker News thread about AI affordability points at the right problem, but developer teams need a more useful cost model: retries, cache misses, review time, routing, and failed loops.

Envoy AI Gateway 1.0 is production-ready. The useful question for builders is when an Envoy-based LLM gateway beats direct SDK calls, LiteLLM, OpenRouter, or a hosted AI gateway.

Sakana says Fugu Ultra stands with Fable, Mythos, GPT-5.5, Gemini, and Opus by orchestrating models instead of being one giant model. Here is what the benchmarks show, what is novel, and what still needs proof.

Sakana Fugu makes a timely argument for model routing: frontier performance should come from swappable systems, not a hard dependency on one proprietary API.

Sakana Fugu Ultra is not just another giant model. It is a learned orchestration layer that routes work across expert models, matches frontier benchmark claims, and makes a serious case for multi-model AI systems.

No single model wins every task anymore, and the companies that never trained one - Factory, Devin, Perplexity, Cursor, OpenCode - are turning that into a moat. This is how model routing works, why open weights and neoclouds make it cheap, and the honest counter-argument.

DeepSeek V4 Pro lands a 63.5 on SWE-bench Verified at $0.435/$0.87 per million tokens, and Flash runs agent inner loops for cents. Here is the worked cost math, the Flash-vs-Pro split, and a clear guide on when to route to DeepSeek instead of a frontier model.

Factory.ai shipped a router that auto-picks the model for each Droid session and fails over across providers. The vendor claims 20-25% lower token spend and 99.9%+ request reliability. Here is what the product actually does, which claims are vendor claims, and whether a router beats DIY routing for your team.

Perplexity launched a $200-a-month agent that coordinates 19 models and calls orchestration, not the model, the product. Here is the strategic case for why the durable, defensible layer in AI sits next to the labs, not inside them - and what 'token value per watt per user' actually means for builders.

OpenRouter Fusion turns multi-model panels into an API feature. The useful lesson is not to run every prompt through more models. It is to define when a task deserves an expensive second opinion.

Factory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their architecture reveals about where the whole industry is heading.

A first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that treats AI as infrastructure, not eschatology - and what that means for API pricing everywhere.
Showing 12 of 13 articles

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 609 topics
Browse All Topics