Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

The coding agent market is moving faster than almost any category in developer tooling right now. Every week there is a new entrant, a new benchmark number, a new funding round. The easy framing is to reduce this to a model quality race - whoever has the smartest AI wins. But after digging deep into Factory AI's documentation and product architecture, I think that framing misses what is actually interesting. The next real competitive front is not raw capability. It is cost-per-completed-task - the same dynamic running through our June 2026 pricing breakdown. And Factory, the company behind the Droid coding agent, has been building toward that thesis quietly and deliberately.

Last updated: June 10, 2026

What Factory AI Actually Builds

Factory describes itself as "an AI-native software development platform that works everywhere you do." The product is called Droid, and it ships as three distinct surfaces: a CLI (droid), a desktop app, and a headless execution mode called droid exec that is clearly aimed at CI/CD pipelines and automation workflows.

The homepage at factory.ai is intentionally sparse - a curl install command, a link to their desktop app, and logos of enterprise customers including Adyen, Groq, Podium, and Chainguard. That enterprise positioning is not an accident. Factory is not trying to win the hobbyist market. They are pitching to engineering teams that need autonomous software development at scale, with governance controls to match.

The docs at docs.factory.ai tell the fuller story. The architecture is built around a few genuinely interesting ideas.

The Droids Concept: Subagents as First-Class Citizens

Most coding agents treat the AI as a monolith - one context window, one model, one task at a time. Factory inverted this. Their "Custom Droids" system lets you define reusable subagents as Markdown files with YAML frontmatter. Each Droid carries its own system prompt, model preference, tool policy, and reasoning effort level.

The practical result is that you can build a team of specialized agents inside a single workflow. A lightweight code-reviewer Droid running on Haiku with read-only tools handles diff analysis. A security-sweeper Droid running on Sonnet with web search handles vulnerability audits. A task-coordinator orchestrator running on Opus handles planning and delegation. Each one has exactly the permissions and capabilities it needs - and exactly the cost profile that fits its job.

This is a significant architectural bet. Custom Droids live as versioned .md files in .factory/droids/ and get checked into your repo. Your team shares them. You review prompt changes in pull requests the same way you review code changes. Factory is essentially turning agent configuration into infrastructure-as-code.

Droid Exec: The Headless Mode That Changes Everything

droid exec is where Factory's automation story gets concrete. It is a one-shot, non-interactive execution mode that completes a task and exits - designed explicitly for CI/CD integration. A few things stand out from their droid exec documentation:

You pick the model per-run with -m. You pick reasoning effort per-run with -r. The spec phase (where the agent plans before executing) can run on a different, cheaper model than the execution phase via --spec-model. Mission mode (--mission) runs multi-agent orchestration with separate --worker-model and --validator-model flags.

This is explicit, per-task model selection baked into the CLI interface. You are not picking a model once in your settings and hoping for the best. You are routing each class of task to the right model, at invocation time, with full CLI composability.

A security sweep over changed files does not need Opus. A nightly dead-code detection job running over four modules in parallel via a GitHub Actions matrix does not need Opus. A final "validate before merge" run probably does. Factory's CLI makes all of this scriptable.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

FrontierCode Benchmark Explained: Why AI Coding Quality Scores Are Wrong (And the Fix)

Jun 10, 2026 • 8 min read

Git Worktrees + Claude Code: The 2026 Playbook for Running Parallel Agents Without Context Switching

Jun 10, 2026 • 7 min read

GitHub Copilot's New Usage-Based Billing: What Changed June 1 and What It Costs Now

Jun 10, 2026 • 7 min read

June 10, 2026: The Day the AI Dev Tool Market Showed Its Whole Hand

Jun 10, 2026 • 7 min read

BYOK and the Open Model Tier

Factory's model strategy goes further than most tools acknowledge. Their BYOK (Bring Your Own Key) system at docs.factory.ai/cli/byok/overview supports Anthropic, OpenAI, and any generic-chat-completion-api compatible endpoint - which covers OpenRouter, Fireworks, Together AI, Groq, Ollama, and more.

Their Available Models page lists a "Droid Core" tier of open-weight models with multipliers as low as 0.12x of their base plan cost: MiniMax M2.5 and M2.7, Kimi K2.5 and K2.6, DeepSeek V4 Pro, Nemotron 3 Ultra, and GLM-5.1. These are not toy models. They are production-grade frontier alternatives that happen to cost dramatically less than the Anthropic and OpenAI flagship options.

The pricing docs note that Droid Core models have separate rate limits and are consumed before Extra Usage credits kick in. In plain terms: for tasks where an open-weight model is good enough, you can effectively get more throughput for free within your existing plan tier.

Why Model Routing Is the Real Story

All of this connects to a broader shift happening across the developer tools market. The Anthropic pricing landscape makes the economics viscerally clear. With the June 2026 Claude lineup, the cost spread is enormous:

Model tier	Input (per MTok)	Output (per MTok)	Best for
Claude Opus (frontier)	$10	$50	Complex reasoning, architecture, multi-step planning
Claude Sonnet	$3	$15	Code generation, refactoring, moderate complexity
Claude Haiku	$1	$5	Review, analysis, classification, simple edits
Open-weight (e.g. MiniMax M2.5 via Factory Droid Core)	~$0.10	~$0.40	High-volume batch work, scaffolding, lint-style checks

A task that costs $0.50 on Opus costs $0.05 on Haiku and fractions of a cent on an open-weight model. If you are running hundreds of automated CI tasks per day, that spread is the difference between a $50/month tool and a $500/month infrastructure line item.

This is the same dynamic driving OpenRouter's Auto Router (which uses NotDiamond to select a model automatically based on prompt complexity) and Claude Code's effort levels (where --max-turns and thinking budget control cost per session). Cursor's model selector, GitHub Copilot's credit tiers, and Amazon Q's per-action billing all reflect the same underlying pressure. The market is converging on the view that developers should not pay frontier prices for non-frontier tasks.

Factory's answer is to put that routing control directly in the hands of the developer, at the CLI level, with per-subagent model configuration as a first-class primitive.

Practical Routing Patterns You Can Use Today

You do not have to wait for some future magic router. The patterns are available right now, across multiple tools.

Per-task model pinning in droid exec. Use --spec-model claude-haiku-4-5-20251001 for the planning phase and the default Sonnet or Opus for execution. The spec phase does most of the token-heavy reading and planning work. Routing it to Haiku at 0.4x the cost of Sonnet can cut planning costs significantly without sacrificing execution quality.

Custom Droid tiering. Define your read-only analysis Droids with model: claude-haiku-4-5-20251001 or even model: custom:MiniMax-M2.5-0 via BYOK. Reserve model: inherit (or explicit Opus) for your orchestrator and complex reasoning Droids. This configuration lives in version control. You review it. You tune it.

Mission mode with differentiated worker/validator models. droid exec --mission --worker-model claude-sonnet-4-5-20250929 --validator-model claude-opus-4-5-20251101 --auto medium "ship the billing webhook" runs workers at Sonnet cost and only escalates to Opus for validation. Workers do volume. Validators do judgment.

Parallel droid exec with worktrees. Factory's --worktree flag lets you run parallel droid exec jobs against the same repo without file conflicts. Each job gets its own model flag. Fan-out cheap analysis jobs at Haiku or open-weight cost, then collect results into a single Opus orchestration step.

OpenRouter as a BYOK routing layer. Because Factory supports generic-chat-completion-api compatible endpoints, you can point a custom model entry at OpenRouter's openrouter/auto endpoint. OpenRouter's auto router picks the optimal model per prompt. You get dynamic routing on top of Factory's static per-Droid configuration.

What to Watch Next

Factory has been methodical. Their HN presence is modest - a 2024 SWE-bench result post, a few community discussion threads, early product news - but their enterprise customer list (Adyen is not a toy customer) suggests real production usage. Their pricing tiers ($20/mo Pro, $100/mo Plus, $200/mo Max with ~5x and ~10x usage multipliers respectively) are structured to grow with team usage, not to gate features.

A few things worth watching:

The Factory Router mentioned in their documentation navigation (linked as "Previous" from the droid exec page) suggests internal routing infrastructure may be coming as an explicit feature, not just a CLI pattern. The current doc page was not publicly accessible at time of writing, but the navigation link exists.

Their open-model tier (Droid Core) is getting real investment. MiniMax M2.7 at 0.12x cost is a significant option for high-volume batch work. As open-weight models continue to improve, the cost-quality tradeoff for non-frontier tasks keeps shifting in the developer's favor.

The Claude Code agents import feature - where you can import ~/.claude/agents/ subagents directly into Factory's Droids system - is a small detail that reveals something about their strategic positioning. They are not trying to force lock-in. They are trying to be composable with the broader agent ecosystem.

This competition is genuinely good for developers. Multiple well-funded teams are racing to give you more control over what your tokens actually do. The winner in this category will not be the tool with the smartest default model. It will be the tool that makes the smartest routing decisions - or gives you the cleanest interface to make them yourself.

Factory is building a real answer to that problem.

FAQ

What is Factory AI's Droid agent?

Droid is Factory AI's coding agent, available as a CLI, desktop app, and headless droid exec mode for CI/CD automation. It supports custom subagents (Custom Droids) with per-agent model selection, tool permissions, and reusable system prompts stored as Markdown files in your repo.

Does Factory AI support Bring Your Own Key (BYOK)?

Yes. Factory's BYOK system supports Anthropic, OpenAI, and any OpenAI-compatible API endpoint via generic-chat-completion-api - which covers OpenRouter, Groq, Fireworks, Together AI, Ollama, and more. Keys are stored locally and are not uploaded to Factory servers.

How does model routing work in coding agents?

Model routing means directing different tasks to different AI models based on cost and complexity. Simple analysis tasks route to cheap models (Haiku, open-weight); complex reasoning routes to frontier models (Opus, GPT-5.5). Tools like Factory's Droid, Claude Code's effort levels, and OpenRouter's auto router all implement variations of this pattern to reduce cost-per-completed-task.

What are Factory AI's pricing tiers?

Factory offers Pro ($20/mo), Plus ($100/mo, ~5x usage of Pro), and Max ($200/mo, ~10x usage of Pro) plans for individuals, plus Teams and Enterprise tiers. All plans include access to Droid Core open-weight models at no additional token cost. Extra Usage credits ($10 minimum) let you continue working after plan limits are reached.

Sources

factory.ai - Factory homepage
docs.factory.ai - Factory documentation index
docs.factory.ai/pricing - Plans and pricing
docs.factory.ai/cli/byok/overview - BYOK documentation
docs.factory.ai/cli/configuration/custom-droids - Custom Droids reference
docs.factory.ai/cli/droid-exec/overview - Droid Exec documentation
docs.factory.ai/models - Available models and multipliers
openrouter.ai/docs/features/model-routing - OpenRouter Auto Router documentation
hn.algolia.com - Hacker News community discussion search

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

How to Model Fable 5 Costs Before They Blow Up Your Budget

AI Coding Tools Pricing: The June 2026 Reality Check

What Factory AI Actually Builds

The Droids Concept: Subagents as First-Class Citizens

Droid Exec: The Headless Mode That Changes Everything

FrontierCode Benchmark Explained: Why AI Coding Quality Scores Are Wrong (And the Fix)

Git Worktrees + Claude Code: The 2026 Playbook for Running Parallel Agents Without Context Switching

GitHub Copilot's New Usage-Based Billing: What Changed June 1 and What It Costs Now

June 10, 2026: The Day the AI Dev Tool Market Showed Its Whole Hand

BYOK and the Open Model Tier

Why Model Routing Is the Real Story

Practical Routing Patterns You Can Use Today

What to Watch Next

FAQ

What is Factory AI's Droid agent?

Does Factory AI support Bring Your Own Key (BYOK)?

How does model routing work in coding agents?

What are Factory AI's pricing tiers?

Sources

Related Tools

Droid

Claude Opus 4.8

Claude Code

DeepSeek-TUI

Apps from Developers Digest

Agent Eval Bench Plus

Auto Company

agentfs

Related Guides

Run AI Models Locally with Ollama and LM Studio

MCP Servers Explained

Building Your First MCP Server

Related Posts

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

How to Model Fable 5 Costs Before They Blow Up Your Budget

AI Coding Tools Pricing: The June 2026 Reality Check

Factory Droid: Review and Setup Guide (2026)

Claude Code vs Droid (Factory AI): Which Terminal Agent in 2026

What the 'Notes on DeepSeek' Essay Gets Right About Open-Weights Economics

Get Smarter About AI Dev

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

How to Model Fable 5 Costs Before They Blow Up Your Budget

AI Coding Tools Pricing: The June 2026 Reality Check

What Factory AI Actually Builds

The Droids Concept: Subagents as First-Class Citizens

Droid Exec: The Headless Mode That Changes Everything

FrontierCode Benchmark Explained: Why AI Coding Quality Scores Are Wrong (And the Fix)

Git Worktrees + Claude Code: The 2026 Playbook for Running Parallel Agents Without Context Switching

GitHub Copilot's New Usage-Based Billing: What Changed June 1 and What It Costs Now

June 10, 2026: The Day the AI Dev Tool Market Showed Its Whole Hand

BYOK and the Open Model Tier

Why Model Routing Is the Real Story

Practical Routing Patterns You Can Use Today

What to Watch Next

FAQ

What is Factory AI's Droid agent?

Does Factory AI support Bring Your Own Key (BYOK)?

How does model routing work in coding agents?

What are Factory AI's pricing tiers?

Sources

Related Tools

Droid

Claude Opus 4.8

Claude Code

DeepSeek-TUI

Apps from Developers Digest

Agent Eval Bench Plus

Auto Company

agentfs

Related Guides

Run AI Models Locally with Ollama and LM Studio

MCP Servers Explained

Building Your First MCP Server

Related Posts

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

How to Model Fable 5 Costs Before They Blow Up Your Budget

AI Coding Tools Pricing: The June 2026 Reality Check

Factory Droid: Review and Setup Guide (2026)

Claude Code vs Droid (Factory AI): Which Terminal Agent in 2026