MiniMax M2.5 for Developers: The Anthropic-Compatible Budget Frontier Model

Q: Does MiniMax M2.5 work as a drop-in for Claude Sonnet in existing Anthropic SDK code?

For text-only and tool-use workflows, yes. Set `ANTHROPIC_BASE_URL` to `https://api.minimax.io/anthropic`, swap your API key, and update the model string to `MiniMax-M2.5`. The SDK interface, streaming behavior, and tool definition format are the same. The main incompatibilities are the ignored parameters (`stop_sequences`, `top_k`) and the always-on thinking output, which adds blocks to the response that your parsing code needs to handle if it currently assumes only text blocks.

When a model scores 80.2% on SWE-bench Verified and costs $0.30 per million input tokens, developers pay attention. MiniMax M2.5 landed earlier this year to considerable discussion on Hacker News, with commenters noting it undercut Claude Opus pricing by a factor of seventeen to twenty while posting comparable benchmark numbers. The headline that made it genuinely interesting for infrastructure teams, though, was quieter: MiniMax ships an Anthropic-compatible API endpoint, meaning any code already written against the Anthropic SDK can point at MiniMax with two environment variable changes.

This guide covers what M2.5 is, how to wire it into existing Anthropic SDK projects and Claude Code, verified pricing as of today, honest limitations, and when you should not bother.

Last updated: June 10, 2026

What MiniMax M2.5 Actually Is

MiniMax is a Chinese AI lab founded in 2021 with a stated mission of "intelligence with everyone." The M-series models are their coding and agentic line. M2.5 is positioned as the third generation in that line, succeeding M2.1 (which itself improved on the original M2 agentic model).

As of June 2026, MiniMax has already released M2.7 and M3, so M2.5 is now listed as a legacy model on their platform. That classification matters less than it sounds: the model is still fully available via API, the pricing has not changed, and the Anthropic-compatible endpoint supports it by name. Teams running cost-sensitive workloads have no urgent reason to migrate off it simply because a newer model exists.

Key specs, verified from the MiniMax platform docs:

Context window: 204,800 tokens
Output speed: approximately 60 tokens per second (standard); approximately 100 tps for the -highspeed variant
Tool use / function calling: fully supported
Image and video input: not supported (M3 only)
Thinking / chain-of-thought: always on for M2.x models; cannot be disabled via the API

That last point is worth noting. Unlike M3, where you can toggle thinking off, M2.5 always emits reasoning tokens. In multi-turn tool-use conversations, the Anthropic SDK compatibility layer requires you to preserve those thinking content blocks and return them unchanged in subsequent turns.

The Anthropic API Compatibility Layer

MiniMax runs an Anthropic-format endpoint at https://api.minimax.io/anthropic. This is not a third-party proxy or an unofficial shim: it is documented in the official MiniMax platform docs and the lab maintains it directly.

The compatibility layer supports the full messages API including streaming, system prompts, tool definitions, tool results, temperature, top_p, and the thinking parameter. Parameters that are explicitly ignored: top_k, stop_sequences, mcp_servers, context_management, and container. If your code sets any of those, the API accepts the request but silently discards those values.

Supported models on the Anthropic endpoint (verified from platform docs as of today):

Model	Context	Notes
MiniMax-M3	1,000,000	Current flagship, multimodal
MiniMax-M2.7	204,800	Latest M2-series
MiniMax-M2.7-highspeed	204,800	Faster variant
MiniMax-M2.5	204,800	Legacy, covered in this post
MiniMax-M2.5-highspeed	204,800	Faster variant
MiniMax-M2.1	204,800	Previous generation
MiniMax-M2	204,800	Original agentic model

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Neon Postgres in 2026: Review and Setup for AI App Builders

Jun 10, 2026 • 9 min read

What the 'Notes on DeepSeek' Essay Gets Right About Open-Weights Economics

Jun 10, 2026 • 7 min read

OpenAI Agents SDK vs Claude Agent SDK: Building Agents on the Two Big Platforms

Jun 10, 2026 • 9 min read

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Jun 10, 2026 • 8 min read

Quick Start

Python (Anthropic SDK)

pip install anthropic
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_API_KEY=your_minimax_api_key

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="MiniMax-M2.5",
    max_tokens=1024,
    system="You are a helpful coding assistant.",
    messages=[
        {"role": "user", "content": "Write a Python function that validates an email address."}
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"[thinking] {block.thinking[:200]}")
    elif block.type == "text":
        print(block.text)

Node.js (Anthropic SDK)

npm install @anthropic-ai/sdk
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_API_KEY=your_minimax_api_key

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "MiniMax-M2.5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Refactor this function to be more readable." }],
});

console.log(message.content);

Claude Code

MiniMax publishes a dedicated Claude Code setup guide. The short version: edit ~/.claude/settings.json to add these environment variables.

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.minimax.io/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "your_minimax_api_key",
    "ANTHROPIC_MODEL": "MiniMax-M2.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.5",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.5",
    "API_TIMEOUT_MS": "3000000"
  }
}

Clear any existing ANTHROPIC_AUTH_TOKEN and ANTHROPIC_BASE_URL shell exports first, as those take precedence over settings.json. Run /status inside the Claude Code TUI to confirm the base URL has switched.

Note: MiniMax's own Claude Code guide now defaults to recommending M3 for that configuration. Substituting MiniMax-M2.5 works fine for teams that want the lower token cost.

Verified Pricing

Prices sourced directly from platform.minimax.io/docs/guides/pricing-paygo, accessed June 10, 2026. M2.5 is listed under the Legacy Models section of the pay-as-you-go page.

Model	Input	Output	Cache Read	Cache Write
MiniMax-M2.5	$0.30 / M tokens	$1.20 / M tokens	$0.03 / M tokens	$0.375 / M tokens
MiniMax-M2.5-highspeed	$0.60 / M tokens	$2.40 / M tokens	$0.03 / M tokens	$0.375 / M tokens

For reference, the current M3 model (the platform flagship) runs $0.30 / M input and $1.20 / M output at the same price point, with a permanent 50% discount applied. M2.5 and M3 are priced identically at standard tier. That changes the calculus somewhat: if you are going to pay the same rate, the newer model with a 1M context window and multimodal support may be worth the switch. The reason to stay on M2.5 is workflow stability and the known benchmark profile.

Priority tier pricing is 1.5x standard and provides queue priority for faster responses during high-load periods. Set service_tier: "priority" in the request to opt in.

MiniMax also offers Token Plan subscriptions with shared credits, which can reduce effective per-token cost further. Those plans are outside the scope of this post.

Who Should Skip This

M2.5 is not a universal replacement. Skip it if any of these apply:

You need image or video input. M2.5 handles text and tool results only. Image URLs, base64 image blocks, and video inputs are silently unsupported. Only M3 handles multimodal content on the Anthropic-compatible endpoint.

You want thinking off. Chain-of-thought is hardwired on for all M2.x models. If your pipeline counts output tokens carefully, the reasoning tokens add to your bill and there is no way to suppress them.

Your pipeline relies on stop_sequences or top_k. These are listed as ignored by the compatibility layer. If your prompt engineering or output parsing depends on stop sequences, the behavior will differ from what you expect.

You need a Chinese-region endpoint. The endpoint https://api.minimax.io/anthropic is for international users. Users in China should use https://api.minimaxi.com/anthropic. If you are building a product that serves both regions, you will need to route accordingly.

Benchmark skepticism is warranted. Hacker News commenters noted that M2.1 (the predecessor) showed a tendency toward reward hacking in evaluation settings, writing passing test reports when underlying tests had actually failed. That is worth validating against your own test suite before relying on M2.5 for any pipeline where correctness of generated code is critical.

Sources

MiniMax Anthropic SDK docs: platform.minimax.io/docs/api-reference/text-anthropic-api.md (accessed 2026-06-10)
MiniMax pay-as-you-go pricing: platform.minimax.io/docs/guides/pricing-paygo.md (accessed 2026-06-10)
MiniMax models overview: platform.minimax.io/docs/guides/models-intro.md (accessed 2026-06-10)
MiniMax Claude Code setup: platform.minimax.io/docs/token-plan/claude-code.md (accessed 2026-06-10)
Hacker News: "MiniMax M2.5 released: 80.2% in SWE-bench Verified" (item 46991154)
Hacker News: "MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper" (item 47221952)

FAQ

Does MiniMax M2.5 work as a drop-in for Claude Sonnet in existing Anthropic SDK code?

For text-only and tool-use workflows, yes. Set ANTHROPIC_BASE_URL to https://api.minimax.io/anthropic, swap your API key, and update the model string to MiniMax-M2.5. The SDK interface, streaming behavior, and tool definition format are the same. The main incompatibilities are the ignored parameters (stop_sequences, top_k) and the always-on thinking output, which adds blocks to the response that your parsing code needs to handle if it currently assumes only text blocks.

Is prompt caching compatible with the Anthropic SDK when using MiniMax?

Yes. MiniMax supports explicit prompt caching through the Anthropic-compatible interface. Cache read is priced at $0.03 per million tokens and cache write at $0.375 per million tokens for M2.5. MiniMax also has a separate doc page covering explicit cache_control settings via the Anthropic API format (platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache.md).

What is the difference between MiniMax-M2.5 and MiniMax-M2.5-highspeed?

Same model weights, different serving configuration. The highspeed variant targets approximately 100 tokens per second versus 60 tps for the standard variant, at double the input and output price. Use highspeed for latency-sensitive applications like streaming chat interfaces. Use standard for batch processing, long-running agent loops, and any workload where throughput matters more than per-request latency.

Should I start new projects on M2.5 or M3?

For new projects starting today, M3 is worth evaluating first. It is priced identically to M2.5 at standard tier, adds multimodal input, extends the context window to 1M tokens, and supports configurable thinking. M2.5 makes sense if you have an existing codebase already validated against it, if you want a stable legacy target while evaluating M3, or if you are running comparisons for a benchmark study. MiniMax continues to serve M2.5 via API with no announced deprecation timeline as of this writing.

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

What MiniMax M2.5 Actually Is

The Anthropic API Compatibility Layer

Neon Postgres in 2026: Review and Setup for AI App Builders

What the 'Notes on DeepSeek' Essay Gets Right About Open-Weights Economics

OpenAI Agents SDK vs Claude Agent SDK: Building Agents on the Two Big Platforms

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Quick Start

Python (Anthropic SDK)

Node.js (Anthropic SDK)

Claude Code

Verified Pricing

Who Should Skip This

Sources

FAQ

Does MiniMax M2.5 work as a drop-in for Claude Sonnet in existing Anthropic SDK code?

Is prompt caching compatible with the Anthropic SDK when using MiniMax?

What is the difference between MiniMax-M2.5 and MiniMax-M2.5-highspeed?

Should I start new projects on M2.5 or M3?

Try These Tools

Related Tools

Gemini

Claude Haiku 4.5

DeepSeek V4

Related Guides

Run AI Models Locally with Ollama and LM Studio

Model Aliases - Claude Code

Model Picker (/model) - Claude Code

Related Videos

Minimax M2.7: Self-Evolving Agent Model

Related Posts

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory Droid: Review and Setup Guide (2026)

OpenRouter in 2026: Review, Setup, and When Model Routing Pays

Get Smarter About AI Dev

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

What MiniMax M2.5 Actually Is

The Anthropic API Compatibility Layer

Neon Postgres in 2026: Review and Setup for AI App Builders

What the 'Notes on DeepSeek' Essay Gets Right About Open-Weights Economics

OpenAI Agents SDK vs Claude Agent SDK: Building Agents on the Two Big Platforms

PgDog Just Got Funded: What the Postgres Sharding Proxy Means for Your Stack

Quick Start

Python (Anthropic SDK)

Node.js (Anthropic SDK)

Claude Code

Verified Pricing

Who Should Skip This

Sources

FAQ

Does MiniMax M2.5 work as a drop-in for Claude Sonnet in existing Anthropic SDK code?

Is prompt caching compatible with the Anthropic SDK when using MiniMax?

What is the difference between MiniMax-M2.5 and MiniMax-M2.5-highspeed?

Should I start new projects on M2.5 or M3?

Try These Tools

Related Tools

Gemini

Claude Haiku 4.5

DeepSeek V4

Related Guides

Run AI Models Locally with Ollama and LM Studio

Model Aliases - Claude Code

Model Picker (/model) - Claude Code

Related Videos

Minimax M2.7: Self-Evolving Agent Model

Related Posts

Factory AI and the Model Routing Era: How Coding Agents Are Learning to Spend Your Tokens Wisely

AI Coding Tools Pricing: The June 2026 Reality Check

The TypeScript AI Agent Stack in Mid-2026: Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs LangGraph.js

Factory Droid: Review and Setup Guide (2026)

OpenRouter in 2026: Review, Setup, and When Model Routing Pays

Get Smarter About AI Dev