TL;DR
OpenRouter gives you one API key for 300+ models, automatic fallbacks, and intelligent provider routing. Here is what it actually costs, how to set it up in five minutes, and when you should skip it entirely.
Read next
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Windsurf/Devin, and the Anthropic API - verified from live pricing pages on June 10, 2026.
9 min readFour mature, production-ready TypeScript frameworks have made building agents genuinely enjoyable. Here is how to pick the right one - and how they fit together.
10 min readFactory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their architecture reveals about where the whole industry is heading.
8 min readModel fragmentation is the quiet tax on every AI-powered product in 2026. You want Claude for reasoning, Gemini for long-context work, and a cheap open-source model for high-volume classification -- but managing three separate API keys, three billing dashboards, and three failure modes adds up fast. OpenRouter is the answer a lot of teams have landed on: one endpoint, one key, access to hundreds of models, with the routing layer handled for you.
This post covers what OpenRouter actually does, how to wire it up in minutes, what the routing controls look like in practice, and -- crucially -- the cases where going direct to a provider still makes more sense.
Last updated: June 10, 2026
OpenRouter is a unified API proxy that sits between your application and the underlying model providers. You send a standard OpenAI-compatible POST /api/v1/chat/completions request to https://openrouter.ai/api/v1, specify a model slug like anthropic/claude-sonnet-4.5 or meta-llama/llama-3.3-70b-instruct, and OpenRouter routes it to the best available provider for that model.
It is not a fine-tuning platform, not a model host in the traditional sense, and not a replacement for the provider APIs when you need capabilities those APIs expose but OpenRouter does not surface. It is a routing and reliability layer -- and a billing consolidation tool.
The catalog currently sits at 300+ models across dozens of providers. You browse the full list at openrouter.ai/models or query it programmatically via GET /api/v1/models.
You need an OpenRouter account and an API key from openrouter.ai/settings/keys.
Option 1: Raw HTTP (any language)
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
}'
Option 2: OpenAI SDK drop-in (TypeScript)
If you have existing code using the OpenAI SDK, point baseURL at OpenRouter and swap your key:
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [{ role: 'user', content: 'Explain caching in plain English' }],
});
No other code changes required. Every model slug in the OpenRouter catalog works as the model value.
Option 3: Native SDK
npm install @openrouter/sdkimport { OpenRouter } from '@openrouter/sdk';
const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
const result = await client.chat.send({
model: 'meta-llama/llama-3.3-70b-instruct',
messages: [{ role: 'user', content: 'Summarize this article.' }],
});
console.log(result.choices[0].message.content);
The native SDK adds full TypeScript types auto-generated from OpenRouter's OpenAPI spec and handles streaming, embeddings, and the provider routing fields natively.
This is where OpenRouter earns its keep. By default, the router load-balances across providers for your chosen model, weighted by the inverse square of price while filtering out providers that have seen recent outages. Provider A at $1/M tokens is nine times more likely to be picked than Provider C at $3/M tokens, because (1/1^2) vs (1/3^2).
You can override this with the provider object in your request body.
Sort by a specific dimension:
const result = await client.chat.send({
model: 'meta-llama/llama-3.3-70b-instruct',
messages: [{ role: 'user', content: 'Hello' }],
provider: { sort: 'throughput' }, // or 'latency' or 'price'
});
Shortcut slugs let you skip the provider object entirely:
model:nitro -- sorts by throughput (maximum tokens per second)model:floor -- sorts by price (cheapest available provider)# Fastest provider for this model
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "meta-llama/llama-3.3-70b-instruct:nitro", "messages": [...]}'
Pin specific providers with fallbacks:
provider: {
order: ['anthropic', 'openai'],
allow_fallbacks: true,
}
Block providers you do not want:
provider: {
ignore: ['provider-slug-a', 'provider-slug-b'],
}
Set performance thresholds using percentiles:
OpenRouter tracks p50, p75, p90, and p99 latency and throughput over a rolling five-minute window. You can deprioritize providers that fall below your thresholds without blocking them entirely as a hard fallback:
provider: {
sort: { by: 'price', partition: 'none' },
preferredMaxLatency: { p90: 3 }, // deprioritize if >3s for 90% of requests
preferredMinThroughput: { p50: 100 }, // prefer >100 tokens/sec median
}
Data compliance controls:
provider: {
data_collection: 'deny', // skip providers that may log prompts
zdr: true, // restrict to Zero Data Retention endpoints only
}
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 7 min read
Jun 8, 2026 • 8 min read
If you do not want to manage model selection at all, set model to openrouter/auto. This uses NotDiamond's routing system to analyze your prompt and pick from a curated pool of high-quality models including Claude Sonnet 4.5, GPT-5.1, Gemini 3.1 Pro, and DeepSeek 3.2 (pool composition may change; check openrouter.ai/openrouter/auto for the current list).
The response includes the model field showing which model was actually selected. For multi-turn conversations, pass a session_id to keep the router pinned to the same model and provider, which also maximizes prompt cache efficiency:
const result = await client.chat.send({
model: 'openrouter/auto',
session_id: 'my-conversation-123',
messages: [...],
});
console.log('Model selected:', result.model);
Without a session_id, stickiness is inferred automatically after the first prompt cache hit, but explicit session IDs are the safer choice for multi-turn agents.
OpenRouter passes through provider pricing and adds a markup. The exact percentage markup varies by model and is displayed on each model's page at openrouter.ai/models. Pricing in the Models API is returned as USD per token in the pricing object, and the usage field in each response gives you the actual token counts billed.
A few things to know:
"0" in the prompt or completion pricing fields are currently free (rate limits apply -- see the FAQ at openrouter.ai/docs/faq).max_price field in the provider object lets you set a hard price cap per request; OpenRouter will not execute the request if no provider meets the threshold.OpenRouter is not the right tool for every situation. Skip it if:
You have a single-provider product with no fallback requirements. If your entire product runs on one model from one provider and you have a direct API agreement, you are paying the routing markup for no benefit.
You need provider-specific features not exposed through the unified API. Some models have provider-specific parameters, fine-tuning endpoints, or batch APIs that do not map through OpenRouter's standard chat completions interface.
Latency is critical and every millisecond matters. Adding a proxy hop adds latency. For real-time, voice-adjacent, or sub-200ms applications, the additional network hop may matter.
You have compliance requirements that rule out third-party data proxies. Even with data_collection: 'deny' and ZDR endpoints, your prompts pass through OpenRouter's infrastructure. If your data governance policy prohibits any third-party proxy, go direct.
You are running extremely high volume with negotiated provider rates. At enterprise scale, direct provider relationships with custom pricing may beat OpenRouter's rates even after accounting for the routing overhead you would need to build yourself.
The value proposition is clearest in a few scenarios: multi-model products where you want one integration instead of five; prototype and early-stage work where you are still figuring out which model fits which task; reliability-sensitive applications where automatic fallbacks during provider outages are worth the markup; and cost optimization workloads where the sort: 'price' routing or :floor shortcut meaningfully reduces spend on high-volume, lower-stakes inference.
The percentile-based routing is the most underrated feature for production use. Being able to say "give me the cheapest provider that meets p90 latency under three seconds" is a real operational improvement over manually watching provider status pages.
https://openrouter.ai/docs/quickstarthttps://openrouter.ai/docs/features/provider-routinghttps://openrouter.ai/docs/features/model-routinghttps://openrouter.ai/docs/modelsYes. Point the SDK's baseURL to https://openrouter.ai/api/v1 and set your OpenRouter API key. The rest of your code stays unchanged. Every OpenRouter model slug works as the model parameter.
Yes. The API supports standard server-sent events streaming. The native @openrouter/sdk handles streaming natively, and it works through the OpenAI SDK drop-in path as well.
By default, allow_fallbacks is true. OpenRouter automatically tries the next provider in its list if the primary fails. You can disable this with allow_fallbacks: false if you need strict control, or you can set an explicit order array to control the fallback sequence.
Check the model field in the response object. It will contain the actual model slug that handled the request, for example anthropic/claude-sonnet-4.5. This is useful for debugging cost anomalies and for understanding routing patterns over time.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Unified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolCentralized manager for MCP servers. Connect once to localhost:37373 and access all your servers through a single endpoi...
View ToolThe easiest way to run LLMs locally. One command to pull and run any model. OpenAI-compatible API. 52M+ monthly download...
View ToolOpen-source ChatGPT alternative that runs 100% offline. Desktop app with local models, cloud API connections, custom ass...
View ToolPick a model in 30 seconds. Built for the answer, not the marketing.
View AppRoute prompts to the right model based on cost, latency, and priority rules.
View AppTurn API docs into endpoint maps, auth setup, demo ideas, and build-ready prompts.
View AppA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting StartedEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...
Four mature, production-ready TypeScript frameworks have made building agents genuinely enjoyable. Here is how to pick t...
Factory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their ar...
Factory Droid is a terminal-native AI coding agent with multi-model routing, headless CI execution, and browser automati...
A hands-on look at Mastra, the open source TypeScript framework for building production-ready AI agents and workflows --...
A first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that tr...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.