OpenRouter in 2026: Review, Setup, and When Model Routing Pays

Q: Can I use OpenRouter as a drop-in for the OpenAI SDK?

Yes. Point the SDK's `baseURL` to `https://openrouter.ai/api/v1` and set your OpenRouter API key. The rest of your code stays unchanged. Every OpenRouter model slug works as the `model` parameter.

Q: What happens when a provider goes down?

By default, `allow_fallbacks` is `true`. OpenRouter automatically tries the next provider in its list if the primary fails. You can disable this with `allow_fallbacks: false` if you need strict control, or you can set an explicit `order` array to control the fallback sequence.

Q: How do I know which model the Auto Router actually used?

Check the `model` field in the response object. It will contain the actual model slug that handled the request, for example `anthropic/claude-sonnet-4.5`. This is useful for debugging cost anomalies and for understanding routing patterns over time.

Model fragmentation is the quiet tax on every AI-powered product in 2026. You want Claude for reasoning, Gemini for long-context work, and a cheap open-source model for high-volume classification -- but managing three separate API keys, three billing dashboards, and three failure modes adds up fast. OpenRouter is the answer a lot of teams have landed on: one endpoint, one key, access to hundreds of models, with the routing layer handled for you.

This post covers what OpenRouter actually does, how to wire it up in minutes, what the routing controls look like in practice, and -- crucially -- the cases where going direct to a provider still makes more sense.

Last updated: June 10, 2026

What OpenRouter Is (and Is Not)

OpenRouter is a unified API proxy that sits between your application and the underlying model providers. You send a standard OpenAI-compatible POST /api/v1/chat/completions request to https://openrouter.ai/api/v1, specify a model slug like anthropic/claude-sonnet-4.5 or meta-llama/llama-3.3-70b-instruct, and OpenRouter routes it to the best available provider for that model.

It is not a fine-tuning platform, not a model host in the traditional sense, and not a replacement for the provider APIs when you need capabilities those APIs expose but OpenRouter does not surface. It is a routing and reliability layer -- and a billing consolidation tool.

The catalog currently sits at 300+ models across dozens of providers. You browse the full list at openrouter.ai/models or query it programmatically via GET /api/v1/models.

Quick Start: Up and Running in Five Minutes

You need an OpenRouter account and an API key from openrouter.ai/settings/keys.

Option 1: Raw HTTP (any language)

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

Option 2: OpenAI SDK drop-in (TypeScript)

If you have existing code using the OpenAI SDK, point baseURL at OpenRouter and swap your key:

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});

const completion = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [{ role: 'user', content: 'Explain caching in plain English' }],
});

No other code changes required. Every model slug in the OpenRouter catalog works as the model value.

Option 3: Native SDK

npm install @openrouter/sdk

import { OpenRouter } from '@openrouter/sdk';

const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });

const result = await client.chat.send({
  model: 'meta-llama/llama-3.3-70b-instruct',
  messages: [{ role: 'user', content: 'Summarize this article.' }],
});

console.log(result.choices[0].message.content);

The native SDK adds full TypeScript types auto-generated from OpenRouter's OpenAPI spec and handles streaming, embeddings, and the provider routing fields natively.

How Provider Routing Works

This is where OpenRouter earns its keep. By default, the router load-balances across providers for your chosen model, weighted by the inverse square of price while filtering out providers that have seen recent outages. Provider A at $1/M tokens is nine times more likely to be picked than Provider C at $3/M tokens, because (1/1^2) vs (1/3^2).

You can override this with the provider object in your request body.

Sort by a specific dimension:

const result = await client.chat.send({
  model: 'meta-llama/llama-3.3-70b-instruct',
  messages: [{ role: 'user', content: 'Hello' }],
  provider: { sort: 'throughput' }, // or 'latency' or 'price'
});

Shortcut slugs let you skip the provider object entirely:

model:nitro -- sorts by throughput (maximum tokens per second)
model:floor -- sorts by price (cheapest available provider)

# Fastest provider for this model
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/llama-3.3-70b-instruct:nitro", "messages": [...]}'

Pin specific providers with fallbacks:

provider: {
  order: ['anthropic', 'openai'],
  allow_fallbacks: true,
}

Block providers you do not want:

provider: {
  ignore: ['provider-slug-a', 'provider-slug-b'],
}

Set performance thresholds using percentiles:

OpenRouter tracks p50, p75, p90, and p99 latency and throughput over a rolling five-minute window. You can deprioritize providers that fall below your thresholds without blocking them entirely as a hard fallback:

provider: {
  sort: { by: 'price', partition: 'none' },
  preferredMaxLatency: { p90: 3 }, // deprioritize if >3s for 90% of requests
  preferredMinThroughput: { p50: 100 }, // prefer >100 tokens/sec median
}

Data compliance controls:

provider: {
  data_collection: 'deny', // skip providers that may log prompts
  zdr: true,               // restrict to Zero Data Retention endpoints only
}

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Jun 10, 2026 • 8 min read

Vercel AI SDK 6 vs LangGraph 1.0: Which Agent Framework Should TypeScript Teams Use?

Jun 10, 2026 • 8 min read

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

Jun 10, 2026 • 7 min read

Agent Config Files Are Executable Supply Chain

Jun 8, 2026 • 8 min read

The Auto Router

If you do not want to manage model selection at all, set model to openrouter/auto. This uses NotDiamond's routing system to analyze your prompt and pick from a curated pool of high-quality models including Claude Sonnet 4.5, GPT-5.1, Gemini 3.1 Pro, and DeepSeek 3.2 (pool composition may change; check openrouter.ai/openrouter/auto for the current list).

The response includes the model field showing which model was actually selected. For multi-turn conversations, pass a session_id to keep the router pinned to the same model and provider, which also maximizes prompt cache efficiency:

const result = await client.chat.send({
  model: 'openrouter/auto',
  session_id: 'my-conversation-123',
  messages: [...],
});
console.log('Model selected:', result.model);

Without a session_id, stickiness is inferred automatically after the first prompt cache hit, but explicit session IDs are the safer choice for multi-turn agents.

Pricing: What the Routing Layer Costs

OpenRouter passes through provider pricing and adds a markup. The exact percentage markup varies by model and is displayed on each model's page at openrouter.ai/models. Pricing in the Models API is returned as USD per token in the pricing object, and the usage field in each response gives you the actual token counts billed.

A few things to know:

Models listed as "0" in the prompt or completion pricing fields are currently free (rate limits apply -- see the FAQ at openrouter.ai/docs/faq).
If you bring your own API key (BYOK) for a provider, OpenRouter routes through your key for those endpoints, which affects how the markup applies -- check the BYOK docs for specifics.
The max_price field in the provider object lets you set a hard price cap per request; OpenRouter will not execute the request if no provider meets the threshold.
Specific markup percentages change over time and are not hardcoded here -- verify current rates on the model pages before budgeting.

Who Should Skip OpenRouter

OpenRouter is not the right tool for every situation. Skip it if:

You have a single-provider product with no fallback requirements. If your entire product runs on one model from one provider and you have a direct API agreement, you are paying the routing markup for no benefit.

You need provider-specific features not exposed through the unified API. Some models have provider-specific parameters, fine-tuning endpoints, or batch APIs that do not map through OpenRouter's standard chat completions interface.

Latency is critical and every millisecond matters. Adding a proxy hop adds latency. For real-time, voice-adjacent, or sub-200ms applications, the additional network hop may matter.

You have compliance requirements that rule out third-party data proxies. Even with data_collection: 'deny' and ZDR endpoints, your prompts pass through OpenRouter's infrastructure. If your data governance policy prohibits any third-party proxy, go direct.

You are running extremely high volume with negotiated provider rates. At enterprise scale, direct provider relationships with custom pricing may beat OpenRouter's rates even after accounting for the routing overhead you would need to build yourself.

When Routing Genuinely Pays

The value proposition is clearest in a few scenarios: multi-model products where you want one integration instead of five; prototype and early-stage work where you are still figuring out which model fits which task; reliability-sensitive applications where automatic fallbacks during provider outages are worth the markup; and cost optimization workloads where the sort: 'price' routing or :floor shortcut meaningfully reduces spend on high-volume, lower-stakes inference.

The percentile-based routing is the most underrated feature for production use. Being able to say "give me the cheapest provider that meets p90 latency under three seconds" is a real operational improvement over manually watching provider status pages.

Sources

OpenRouter Quickstart: https://openrouter.ai/docs/quickstart
Provider Routing docs: https://openrouter.ai/docs/features/provider-routing
Model Routing (Auto Router): https://openrouter.ai/docs/features/model-routing
Models API reference: https://openrouter.ai/docs/models
All code examples verified against live documentation as of June 10, 2026

FAQ

Can I use OpenRouter as a drop-in for the OpenAI SDK?

Yes. Point the SDK's baseURL to https://openrouter.ai/api/v1 and set your OpenRouter API key. The rest of your code stays unchanged. Every OpenRouter model slug works as the model parameter.

Does OpenRouter support streaming?

Yes. The API supports standard server-sent events streaming. The native @openrouter/sdk handles streaming natively, and it works through the OpenAI SDK drop-in path as well.

What happens when a provider goes down?

By default, allow_fallbacks is true. OpenRouter automatically tries the next provider in its list if the primary fails. You can disable this with allow_fallbacks: false if you need strict control, or you can set an explicit order array to control the fallback sequence.

How do I know which model the Auto Router actually used?

Check the model field in the response object. It will contain the actual model slug that handled the request, for example anthropic/claude-sonnet-4.5. This is useful for debugging cost anomalies and for understanding routing patterns over time.

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

What OpenRouter Is (and Is Not)

Quick Start: Up and Running in Five Minutes

How Provider Routing Works

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Vercel AI SDK 6 vs LangGraph 1.0: Which Agent Framework Should TypeScript Teams Use?

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

Agent Config Files Are Executable Supply Chain

The Auto Router

Pricing: What the Routing Layer Costs

Who Should Skip OpenRouter

When Routing Genuinely Pays

Sources

FAQ

Can I use OpenRouter as a drop-in for the OpenAI SDK?

Does OpenRouter support streaming?

What happens when a provider goes down?

How do I know which model the Auto Router actually used?

Related Tools

OpenRouter

MCP Hub

Ollama

Jan

Apps from Developers Digest

AI Models

AI Model Router

Docs To Demo

Related Guides

Claude Code Complete Course

Run AI Models Locally with Ollama and LM Studio

Writing Your First Claude Code Skill

Related Posts

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Factory Droid: Review and Setup Guide (2026)

Mastra: Review and Setup Guide for TypeScript Agent Apps (2026)

What the 'Notes on DeepSeek' Essay Gets Right About Open-Weights Economics

Get Smarter About AI Dev

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

What OpenRouter Is (and Is Not)

Quick Start: Up and Running in Five Minutes

How Provider Routing Works

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Vercel AI SDK 6 vs LangGraph 1.0: Which Agent Framework Should TypeScript Teams Use?

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

Agent Config Files Are Executable Supply Chain

The Auto Router

Pricing: What the Routing Layer Costs

Who Should Skip OpenRouter

When Routing Genuinely Pays

Sources

FAQ

Can I use OpenRouter as a drop-in for the OpenAI SDK?

Does OpenRouter support streaming?

What happens when a provider goes down?

How do I know which model the Auto Router actually used?

Related Tools

OpenRouter

MCP Hub

Ollama

Jan

Apps from Developers Digest

AI Models

AI Model Router

Docs To Demo

Related Guides

Claude Code Complete Course

Run AI Models Locally with Ollama and LM Studio

Writing Your First Claude Code Skill

Related Posts

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

Factory Droid: Review and Setup Guide (2026)

Mastra: Review and Setup Guide for TypeScript Agent Apps (2026)