TL;DR
Factory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their architecture reveals about where the whole industry is heading.
Read next
Fable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-outcome math that actually decides whether the upgrade pays.
8 min readClaude Fable 5's $10/$50 per million token pricing can catch teams off guard - here is how to build a real cost model before you commit.
9 min readEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Windsurf/Devin, and the Anthropic API - verified from live pricing pages on June 10, 2026.
9 min readThe coding agent market is moving faster than almost any category in developer tooling right now. Every week there is a new entrant, a new benchmark number, a new funding round. The easy framing is to reduce this to a model quality race - whoever has the smartest AI wins. But after digging deep into Factory AI's documentation and product architecture, I think that framing misses what is actually interesting. The next real competitive front is not raw capability. It is cost-per-completed-task - the same dynamic running through our June 2026 pricing breakdown. And Factory, the company behind the Droid coding agent, has been building toward that thesis quietly and deliberately.
Last updated: June 10, 2026
Factory describes itself as "an AI-native software development platform that works everywhere you do." The product is called Droid, and it ships as three distinct surfaces: a CLI (droid), a desktop app, and a headless execution mode called droid exec that is clearly aimed at CI/CD pipelines and automation workflows.
The homepage at factory.ai is intentionally sparse - a curl install command, a link to their desktop app, and logos of enterprise customers including Adyen, Groq, Podium, and Chainguard. That enterprise positioning is not an accident. Factory is not trying to win the hobbyist market. They are pitching to engineering teams that need autonomous software development at scale, with governance controls to match.
The docs at docs.factory.ai tell the fuller story. The architecture is built around a few genuinely interesting ideas.
Most coding agents treat the AI as a monolith - one context window, one model, one task at a time. Factory inverted this. Their "Custom Droids" system lets you define reusable subagents as Markdown files with YAML frontmatter. Each Droid carries its own system prompt, model preference, tool policy, and reasoning effort level.
The practical result is that you can build a team of specialized agents inside a single workflow. A lightweight code-reviewer Droid running on Haiku with read-only tools handles diff analysis. A security-sweeper Droid running on Sonnet with web search handles vulnerability audits. A task-coordinator orchestrator running on Opus handles planning and delegation. Each one has exactly the permissions and capabilities it needs - and exactly the cost profile that fits its job.
This is a significant architectural bet. Custom Droids live as versioned .md files in .factory/droids/ and get checked into your repo. Your team shares them. You review prompt changes in pull requests the same way you review code changes. Factory is essentially turning agent configuration into infrastructure-as-code.
droid exec is where Factory's automation story gets concrete. It is a one-shot, non-interactive execution mode that completes a task and exits - designed explicitly for CI/CD integration. A few things stand out from their droid exec documentation:
You pick the model per-run with -m. You pick reasoning effort per-run with -r. The spec phase (where the agent plans before executing) can run on a different, cheaper model than the execution phase via --spec-model. Mission mode (--mission) runs multi-agent orchestration with separate --worker-model and --validator-model flags.
This is explicit, per-task model selection baked into the CLI interface. You are not picking a model once in your settings and hoping for the best. You are routing each class of task to the right model, at invocation time, with full CLI composability.
A security sweep over changed files does not need Opus. A nightly dead-code detection job running over four modules in parallel via a GitHub Actions matrix does not need Opus. A final "validate before merge" run probably does. Factory's CLI makes all of this scriptable.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 7 min read
Factory's model strategy goes further than most tools acknowledge. Their BYOK (Bring Your Own Key) system at docs.factory.ai/cli/byok/overview supports Anthropic, OpenAI, and any generic-chat-completion-api compatible endpoint - which covers OpenRouter, Fireworks, Together AI, Groq, Ollama, and more.
Their Available Models page lists a "Droid Core" tier of open-weight models with multipliers as low as 0.12x of their base plan cost: MiniMax M2.5 and M2.7, Kimi K2.5 and K2.6, DeepSeek V4 Pro, Nemotron 3 Ultra, and GLM-5.1. These are not toy models. They are production-grade frontier alternatives that happen to cost dramatically less than the Anthropic and OpenAI flagship options.
The pricing docs note that Droid Core models have separate rate limits and are consumed before Extra Usage credits kick in. In plain terms: for tasks where an open-weight model is good enough, you can effectively get more throughput for free within your existing plan tier.
All of this connects to a broader shift happening across the developer tools market. The Anthropic pricing landscape makes the economics viscerally clear. With the June 2026 Claude lineup, the cost spread is enormous:
| Model tier | Input (per MTok) | Output (per MTok) | Best for |
|---|---|---|---|
| Claude Opus (frontier) | $10 | $50 | Complex reasoning, architecture, multi-step planning |
| Claude Sonnet | $3 | $15 | Code generation, refactoring, moderate complexity |
| Claude Haiku | $1 | $5 | Review, analysis, classification, simple edits |
| Open-weight (e.g. MiniMax M2.5 via Factory Droid Core) | ~$0.10 | ~$0.40 | High-volume batch work, scaffolding, lint-style checks |
A task that costs $0.50 on Opus costs $0.05 on Haiku and fractions of a cent on an open-weight model. If you are running hundreds of automated CI tasks per day, that spread is the difference between a $50/month tool and a $500/month infrastructure line item.
This is the same dynamic driving OpenRouter's Auto Router (which uses NotDiamond to select a model automatically based on prompt complexity) and Claude Code's effort levels (where --max-turns and thinking budget control cost per session). Cursor's model selector, GitHub Copilot's credit tiers, and Amazon Q's per-action billing all reflect the same underlying pressure. The market is converging on the view that developers should not pay frontier prices for non-frontier tasks.
Factory's answer is to put that routing control directly in the hands of the developer, at the CLI level, with per-subagent model configuration as a first-class primitive.
You do not have to wait for some future magic router. The patterns are available right now, across multiple tools.
Per-task model pinning in droid exec. Use --spec-model claude-haiku-4-5-20251001 for the planning phase and the default Sonnet or Opus for execution. The spec phase does most of the token-heavy reading and planning work. Routing it to Haiku at 0.4x the cost of Sonnet can cut planning costs significantly without sacrificing execution quality.
Custom Droid tiering. Define your read-only analysis Droids with model: claude-haiku-4-5-20251001 or even model: custom:MiniMax-M2.5-0 via BYOK. Reserve model: inherit (or explicit Opus) for your orchestrator and complex reasoning Droids. This configuration lives in version control. You review it. You tune it.
Mission mode with differentiated worker/validator models. droid exec --mission --worker-model claude-sonnet-4-5-20250929 --validator-model claude-opus-4-5-20251101 --auto medium "ship the billing webhook" runs workers at Sonnet cost and only escalates to Opus for validation. Workers do volume. Validators do judgment.
Parallel droid exec with worktrees. Factory's --worktree flag lets you run parallel droid exec jobs against the same repo without file conflicts. Each job gets its own model flag. Fan-out cheap analysis jobs at Haiku or open-weight cost, then collect results into a single Opus orchestration step.
OpenRouter as a BYOK routing layer. Because Factory supports generic-chat-completion-api compatible endpoints, you can point a custom model entry at OpenRouter's openrouter/auto endpoint. OpenRouter's auto router picks the optimal model per prompt. You get dynamic routing on top of Factory's static per-Droid configuration.
Factory has been methodical. Their HN presence is modest - a 2024 SWE-bench result post, a few community discussion threads, early product news - but their enterprise customer list (Adyen is not a toy customer) suggests real production usage. Their pricing tiers ($20/mo Pro, $100/mo Plus, $200/mo Max with ~5x and ~10x usage multipliers respectively) are structured to grow with team usage, not to gate features.
A few things worth watching:
The Factory Router mentioned in their documentation navigation (linked as "Previous" from the droid exec page) suggests internal routing infrastructure may be coming as an explicit feature, not just a CLI pattern. The current doc page was not publicly accessible at time of writing, but the navigation link exists.
Their open-model tier (Droid Core) is getting real investment. MiniMax M2.7 at 0.12x cost is a significant option for high-volume batch work. As open-weight models continue to improve, the cost-quality tradeoff for non-frontier tasks keeps shifting in the developer's favor.
The Claude Code agents import feature - where you can import ~/.claude/agents/ subagents directly into Factory's Droids system - is a small detail that reveals something about their strategic positioning. They are not trying to force lock-in. They are trying to be composable with the broader agent ecosystem.
This competition is genuinely good for developers. Multiple well-funded teams are racing to give you more control over what your tokens actually do. The winner in this category will not be the tool with the smartest default model. It will be the tool that makes the smartest routing decisions - or gives you the cleanest interface to make them yourself.
Factory is building a real answer to that problem.
Droid is Factory AI's coding agent, available as a CLI, desktop app, and headless droid exec mode for CI/CD automation. It supports custom subagents (Custom Droids) with per-agent model selection, tool permissions, and reusable system prompts stored as Markdown files in your repo.
Yes. Factory's BYOK system supports Anthropic, OpenAI, and any OpenAI-compatible API endpoint via generic-chat-completion-api - which covers OpenRouter, Groq, Fireworks, Together AI, Ollama, and more. Keys are stored locally and are not uploaded to Factory servers.
Model routing means directing different tasks to different AI models based on cost and complexity. Simple analysis tasks route to cheap models (Haiku, open-weight); complex reasoning routes to frontier models (Opus, GPT-5.5). Tools like Factory's Droid, Claude Code's effort levels, and OpenRouter's auto router all implement variations of this pattern to reduce cost-per-completed-task.
Factory offers Pro ($20/mo), Plus ($100/mo, ~5x usage of Pro), and Max ($200/mo, ~10x usage of Pro) plans for individuals, plus Teams and Enterprise tiers. All plans include access to Droid Core open-weight models at no additional token cost. Extra Usage credits ($10 minimum) let you continue working after plan limits are reached.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Factory AI's terminal coding agent. Runs Anthropic and OpenAI models in one subscription. Handles full tasks end-to-end...
View ToolAnthropic's recommended default for complex work, released May 28, 2026. 1M context, 128K output, $5/$25 per million tok...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolOpen-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolScore every coding agent on your own tasks. Catch regressions in CI.
View AppDescribe your company and agent teams handle operations.
View AppGive your agents a filesystem that branches like git. Crash-safe by default.
View AppInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedWhat MCP servers are, how they work, and how to build your own in 5 minutes.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsFable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-o...
Claude Fable 5's $10/$50 per million token pricing can catch teams off guard - here is how to build a real cost model be...
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...
Factory Droid is a terminal-native AI coding agent with multi-model routing, headless CI execution, and browser automati...
A practical comparison of the two most capable terminal-native AI coding agents in 2026 - covering pricing, model flexib...
A first-hand visit to DeepSeek HQ reveals something more interesting than benchmark scores: a 300-person company that tr...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.