TL;DR
MiniMax M2.5 hits 80.2% on SWE-bench Verified and plugs into the Anthropic SDK with two environment variables. Here is what you need to know before switching.
Read next
Factory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their architecture reveals about where the whole industry is heading.
8 min readEvery major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Claude Code, Windsurf/Devin, and the Anthropic API - verified from live pricing pages on June 10, 2026.
9 min readFour mature, production-ready TypeScript frameworks have made building agents genuinely enjoyable. Here is how to pick the right one - and how they fit together.
10 min readWhen a model scores 80.2% on SWE-bench Verified and costs $0.30 per million input tokens, developers pay attention. MiniMax M2.5 landed earlier this year to considerable discussion on Hacker News, with commenters noting it undercut Claude Opus pricing by a factor of seventeen to twenty while posting comparable benchmark numbers. The headline that made it genuinely interesting for infrastructure teams, though, was quieter: MiniMax ships an Anthropic-compatible API endpoint, meaning any code already written against the Anthropic SDK can point at MiniMax with two environment variable changes.
This guide covers what M2.5 is, how to wire it into existing Anthropic SDK projects and Claude Code, verified pricing as of today, honest limitations, and when you should not bother.
Last updated: June 10, 2026
MiniMax is a Chinese AI lab founded in 2021 with a stated mission of "intelligence with everyone." The M-series models are their coding and agentic line. M2.5 is positioned as the third generation in that line, succeeding M2.1 (which itself improved on the original M2 agentic model).
As of June 2026, MiniMax has already released M2.7 and M3, so M2.5 is now listed as a legacy model on their platform. That classification matters less than it sounds: the model is still fully available via API, the pricing has not changed, and the Anthropic-compatible endpoint supports it by name. Teams running cost-sensitive workloads have no urgent reason to migrate off it simply because a newer model exists.
Key specs, verified from the MiniMax platform docs:
-highspeed variantThat last point is worth noting. Unlike M3, where you can toggle thinking off, M2.5 always emits reasoning tokens. In multi-turn tool-use conversations, the Anthropic SDK compatibility layer requires you to preserve those thinking content blocks and return them unchanged in subsequent turns.
MiniMax runs an Anthropic-format endpoint at https://api.minimax.io/anthropic. This is not a third-party proxy or an unofficial shim: it is documented in the official MiniMax platform docs and the lab maintains it directly.
The compatibility layer supports the full messages API including streaming, system prompts, tool definitions, tool results, temperature, top_p, and the thinking parameter. Parameters that are explicitly ignored: top_k, stop_sequences, mcp_servers, context_management, and container. If your code sets any of those, the API accepts the request but silently discards those values.
Supported models on the Anthropic endpoint (verified from platform docs as of today):
| Model | Context | Notes |
|---|---|---|
| MiniMax-M3 | 1,000,000 | Current flagship, multimodal |
| MiniMax-M2.7 | 204,800 | Latest M2-series |
| MiniMax-M2.7-highspeed | 204,800 | Faster variant |
| MiniMax-M2.5 | 204,800 | Legacy, covered in this post |
| MiniMax-M2.5-highspeed | 204,800 | Faster variant |
| MiniMax-M2.1 | 204,800 | Previous generation |
| MiniMax-M2 | 204,800 | Original agentic model |
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 9 min read
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 9 min read
Jun 10, 2026 • 8 min read
pip install anthropic
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_API_KEY=your_minimax_api_key
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="MiniMax-M2.5",
max_tokens=1024,
system="You are a helpful coding assistant.",
messages=[
{"role": "user", "content": "Write a Python function that validates an email address."}
]
)
for block in response.content:
if block.type == "thinking":
print(f"[thinking] {block.thinking[:200]}")
elif block.type == "text":
print(block.text)
npm install @anthropic-ai/sdk
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_API_KEY=your_minimax_api_keyimport Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "MiniMax-M2.5",
max_tokens: 1024,
messages: [{ role: "user", content: "Refactor this function to be more readable." }],
});
console.log(message.content);
MiniMax publishes a dedicated Claude Code setup guide. The short version: edit ~/.claude/settings.json to add these environment variables.
{
"env": {
"ANTHROPIC_BASE_URL": "https://api.minimax.io/anthropic",
"ANTHROPIC_AUTH_TOKEN": "your_minimax_api_key",
"ANTHROPIC_MODEL": "MiniMax-M2.5",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.5",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.5",
"API_TIMEOUT_MS": "3000000"
}
}
Clear any existing ANTHROPIC_AUTH_TOKEN and ANTHROPIC_BASE_URL shell exports first, as those take precedence over settings.json. Run /status inside the Claude Code TUI to confirm the base URL has switched.
Note: MiniMax's own Claude Code guide now defaults to recommending M3 for that configuration. Substituting MiniMax-M2.5 works fine for teams that want the lower token cost.
Prices sourced directly from platform.minimax.io/docs/guides/pricing-paygo, accessed June 10, 2026. M2.5 is listed under the Legacy Models section of the pay-as-you-go page.
| Model | Input | Output | Cache Read | Cache Write |
|---|---|---|---|---|
| MiniMax-M2.5 | $0.30 / M tokens | $1.20 / M tokens | $0.03 / M tokens | $0.375 / M tokens |
| MiniMax-M2.5-highspeed | $0.60 / M tokens | $2.40 / M tokens | $0.03 / M tokens | $0.375 / M tokens |
For reference, the current M3 model (the platform flagship) runs $0.30 / M input and $1.20 / M output at the same price point, with a permanent 50% discount applied. M2.5 and M3 are priced identically at standard tier. That changes the calculus somewhat: if you are going to pay the same rate, the newer model with a 1M context window and multimodal support may be worth the switch. The reason to stay on M2.5 is workflow stability and the known benchmark profile.
Priority tier pricing is 1.5x standard and provides queue priority for faster responses during high-load periods. Set service_tier: "priority" in the request to opt in.
MiniMax also offers Token Plan subscriptions with shared credits, which can reduce effective per-token cost further. Those plans are outside the scope of this post.
M2.5 is not a universal replacement. Skip it if any of these apply:
You need image or video input. M2.5 handles text and tool results only. Image URLs, base64 image blocks, and video inputs are silently unsupported. Only M3 handles multimodal content on the Anthropic-compatible endpoint.
You want thinking off. Chain-of-thought is hardwired on for all M2.x models. If your pipeline counts output tokens carefully, the reasoning tokens add to your bill and there is no way to suppress them.
Your pipeline relies on stop_sequences or top_k. These are listed as ignored by the compatibility layer. If your prompt engineering or output parsing depends on stop sequences, the behavior will differ from what you expect.
You need a Chinese-region endpoint. The endpoint https://api.minimax.io/anthropic is for international users. Users in China should use https://api.minimaxi.com/anthropic. If you are building a product that serves both regions, you will need to route accordingly.
Benchmark skepticism is warranted. Hacker News commenters noted that M2.1 (the predecessor) showed a tendency toward reward hacking in evaluation settings, writing passing test reports when underlying tests had actually failed. That is worth validating against your own test suite before relying on M2.5 for any pipeline where correctness of generated code is critical.
platform.minimax.io/docs/api-reference/text-anthropic-api.md (accessed 2026-06-10)platform.minimax.io/docs/guides/pricing-paygo.md (accessed 2026-06-10)platform.minimax.io/docs/guides/models-intro.md (accessed 2026-06-10)platform.minimax.io/docs/token-plan/claude-code.md (accessed 2026-06-10)For text-only and tool-use workflows, yes. Set ANTHROPIC_BASE_URL to https://api.minimax.io/anthropic, swap your API key, and update the model string to MiniMax-M2.5. The SDK interface, streaming behavior, and tool definition format are the same. The main incompatibilities are the ignored parameters (stop_sequences, top_k) and the always-on thinking output, which adds blocks to the response that your parsing code needs to handle if it currently assumes only text blocks.
Yes. MiniMax supports explicit prompt caching through the Anthropic-compatible interface. Cache read is priced at $0.03 per million tokens and cache write at $0.375 per million tokens for M2.5. MiniMax also has a separate doc page covering explicit cache_control settings via the Anthropic API format (platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache.md).
Same model weights, different serving configuration. The highspeed variant targets approximately 100 tokens per second versus 60 tps for the standard variant, at double the input and output price. Use highspeed for latency-sensitive applications like streaming chat interfaces. Use standard for batch processing, long-running agent loops, and any workload where throughput matters more than per-request latency.
For new projects starting today, M3 is worth evaluating first. It is priced identically to M2.5 at standard tier, adds multimodal input, extends the context window to 1M tokens, and supports configurable thinking. M2.5 makes sense if you have an existing codebase already validated against it, if you want a stable legacy target while evaluating M3, or if you are running comparisons for a benchmark study. MiniMax continues to serve M2.5 via API with no announced deprecation timeline as of this writing.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Google's frontier model family. Gemini 2.5 Pro has 1M token context and top-tier coding benchmarks. Gemini 3 Pro pushes...
View ToolAnthropic's smallest Claude 4.5 model. Near-frontier coding performance at one-third the cost of Sonnet 4 and up to 4-5x...
View ToolDeepSeek's open-weights frontier family, previewed April 24, 2026. V4-Pro is 1.6T total / 49B active params; V4-Flash is...
View ToolInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedUse opus, sonnet, haiku, and best to switch models easily.
Claude CodeInteractive UI to switch models and effort sliders mid-session.
Claude CodeFactory AI's Droid agent surfaces a new competitive front in coding tools: cost-per-completed-task. Here's what their ar...
Every major AI coding tool just went through a pricing shift. Here are the exact numbers for Cursor, GitHub Copilot, Cla...
Four mature, production-ready TypeScript frameworks have made building agents genuinely enjoyable. Here is how to pick t...
Factory Droid is a terminal-native AI coding agent with multi-model routing, headless CI execution, and browser automati...
OpenRouter gives you one API key for 300+ models, automatic fallbacks, and intelligent provider routing. Here is what it...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.