DeepSeek Retires deepseek-chat and deepseek-reasoner on July 24: Your V4 Migration Guide

Q: Is deepseek-chat deprecated?

Yes. The legacy model names `deepseek-chat` and `deepseek-reasoner` were fully retired on July 24, 2026 at 15:59 UTC. Requests using those names now return errors. The replacement is `deepseek-v4-flash` (or `deepseek-v4-pro` for the higher tier).

Last updated: July 25, 2026

If you have deepseek-chat or deepseek-reasoner anywhere in your codebase, you have a hard deadline. DeepSeek's V4 announcement states that both legacy model names "will be fully retired and inaccessible after Jul 24th, 2026, 15:59 (UTC Time)." The official pricing page confirms the same cutoff: deprecation lands at 2026/07/24 15:59 UTC.

The good news: this is one of the gentler migrations in recent memory. The base URL does not change, both legacy names already route to the new model, and the replacement is cheaper per token than almost anything else on the market. The catch is that thinking mode moved from a model name to a request parameter, and a few legacy deepseek-reasoner quirks need cleanup on the way through. This guide is the what-maps-to-what checklist.

What Changed on July 25#

The deprecation deadline has passed. As of July 24, 2026 at 15:59 UTC, the legacy deepseek-chat and deepseek-reasoner model names stopped resolving. Requests using those names now return errors rather than routing to V4-Flash. Prices are unchanged from the June data in this guide (verified July 25 against DeepSeek's live pricing page): V4-Flash at $0.14/$0.28 per MTok, V4-Pro at $0.435/$0.87 per MTok.

If your code was still using the legacy names after the cutoff, every API call is failing. The fix is a one-line model name swap in each request. No base URL change, no API key rotation, no endpoint migration.

What Actually Happens on July 24#

Since the V4 preview shipped on April 24, 2026, the legacy names have been aliases. Per the official announcement, deepseek-chat and deepseek-reasoner are "currently routing to deepseek-v4-flash non-thinking/thinking" respectively. So if your app still sends the old names, you have already been on V4-Flash for weeks - the July 24 date is when the aliases stop resolving entirely.

That means two separate facts worth internalizing:

Your model already changed. If you noticed output drift since late April, this routing is why. Any evals you ran against the old V3-era models no longer describe what you are serving.
Your code will hard-break on July 24. After 15:59 UTC, requests using the legacy names become errors, not redirects.

DeepSeek's own migration advice is one line: "Keep base_url, just update model to deepseek-v4-pro or deepseek-v4-flash." The endpoint stays https://api.deepseek.com, and the API remains compatible with the OpenAI ChatCompletions format, with an Anthropic-compatible surface as a second option (more on that below).

What Maps to What#

The mapping is mechanical for Flash, with one structural change: thinking is no longer a separate model name.

You have today	It routes to	What to write instead
`deepseek-chat`	DeepSeek-V4-Flash, non-thinking mode	`model: "deepseek-v4-flash"`
`deepseek-reasoner`	DeepSeek-V4-Flash, thinking mode	`model: "deepseek-v4-flash"` plus `thinking: {"type": "enabled"}`
(no legacy equivalent)	DeepSeek-V4-Pro, either mode	`model: "deepseek-v4-pro"`, thinking optional

Per the API quick start, thinking mode on V4 models is controlled by a request parameter, not a model suffix. With the OpenAI Python SDK that looks like extra_body={"thinking": {"type": "enabled"}}; in the Node.js SDK, thinking passes as a direct parameter. The docs also show a reasoning_effort parameter (for example "high") for tuning how hard the model thinks.

Note what is new here: V4-Pro had no legacy alias at all. The retirement only forces you onto V4-Flash equivalents. Moving up to Pro is a deliberate choice, which is what the next section is for.

V4-Flash vs V4-Pro: Head-to-Head#

All pricing below is from the official pricing page, verified July 25, 2026 (unchanged from the original June launch). Benchmark figures are from Artificial Analysis and DataCamp, accessed June 11, 2026. Open-weights download sizes are from the DeepSeek-V4-Flash and DeepSeek-V4-Pro Hugging Face repositories.

	DeepSeek-V4-Flash	DeepSeek-V4-Pro
Parameters	284B total / 13B active	1.6T total / 49B active
Open-weights download	160GB	865GB
Context window	1M tokens	1M tokens
Max output	384K tokens	384K tokens
Input, cache miss	$0.14 / MTok	$0.435 / MTok
Input, cache hit	$0.0028 / MTok	$0.003625 / MTok
Output	$0.28 / MTok	$0.87 / MTok
Concurrency limit	2,500 requests	500 requests
AA Intelligence Index	47	52
GDPval-AA (agentic)	1,388	1,554
Cost to run full AA Index	$113	$1,071
License	MIT	MIT

Both models support thinking and non-thinking modes, JSON output, tool calls, and chat prefix completion (beta). FIM completion works in non-thinking mode only.

One number worth dwelling on: the 1M-token context window is standard on both models, which Artificial Analysis notes is an 8x expansion from V3.2's 128K. If you built chunking or context-compression layers to live inside the old window, the migration is a chance to delete code.

A pricing caveat, because honesty beats tidiness: launch-window coverage from Artificial Analysis and DataCamp listed V4-Pro at $1.74 input / $3.48 output per million tokens. The live pricing page as verified on July 25, 2026 lists the lower $0.435 / $0.87 figures above, unchanged since the original guide published in June. Treat the official page as canonical and re-check it before you commit a budget, since the numbers moved once and may move again.

From the archive

Fable 5 with 1M Context: What Actually Works in Practice

Jun 11, 2026 • 10 min read

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

Jun 11, 2026 • 10 min read

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Jun 11, 2026 • 8 min read

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

Jun 11, 2026 • 8 min read

The Migration Checklist#

Work through these in order. For most codebases this is an afternoon, not a sprint.

1. Find every legacy reference#

Grep for both names across code, config, env files, and infrastructure-as-code. The sneaky ones live in evaluation harnesses, fallback chains, and notebooks.

2. Swap the model names#

deepseek-chat becomes deepseek-v4-flash. deepseek-reasoner becomes deepseek-v4-flash with thinking: {"type": "enabled"} in the request body. Base URL stays https://api.deepseek.com.

3. Keep stripping reasoning content in multi-turn flows#

The reasoning model guide documents that the chain of thought comes back in a separate reasoning_content field and must be removed from assistant messages before you send the conversation back - including it triggers errors. If your deepseek-reasoner integration already handles this, keep that code. If you are adding thinking mode for the first time, build this in from day one.

4. Revisit max_tokens#

Legacy deepseek-reasoner defaulted to 32K output with a 64K maximum, including the chain of thought, per the reasoning model guide. V4 models support up to 384K output tokens per the pricing page. If you hardcoded conservative limits to dodge truncation, you have new headroom - but remember output is the expensive direction, so raise limits deliberately.

5. Re-test tool calls in thinking mode#

Legacy deepseek-reasoner did not support function calling at all. On V4, the pricing page lists tool call support on both models. If you previously split traffic between a reasoning model and a tool-calling model to work around this, you can likely collapse that into a single V4-Flash call with thinking enabled. That is a genuine architectural simplification, so re-run your agent evals to confirm.

6. Audit your cache hit rate#

The cache-hit discount on V4 is dramatic: $0.0028 versus $0.14 per million input tokens on Flash, verified July 25, 2026 on the pricing page. Stable system prompts and shared context prefixes pay for themselves at this ratio. We dug into cache-first agent design in DeepSeek V4 for budget coding agents if you want patterns.

7. Check your concurrency assumptions#

V4-Flash allows 2,500 concurrent requests; V4-Pro allows 500. If you are migrating a high-throughput pipeline and considering the Pro upgrade at the same time, the 5x concurrency difference may matter more than the per-token price difference.

8. Decide whether the Anthropic format helps you#

DeepSeek runs a second API surface at https://api.deepseek.com/anthropic that speaks the Anthropic Messages format, per the Anthropic API guide. You point the Anthropic SDK at it with ANTHROPIC_BASE_URL and your DeepSeek key in ANTHROPIC_API_KEY. Model name mapping is automatic: claude-opus variants map to deepseek-v4-pro, claude-sonnet and claude-haiku variants map to deepseek-v4-flash, and anything unmapped defaults to V4-Flash. Streaming, temperature, stop sequences, top_p, tool use, web search result content, and thinking mode are supported, though budget_tokens is ignored. Images, documents, code execution results, and MCP content types are not supported through this bridge.

Decision Guide: Which Path for Which Team#

High-volume, cost-sensitive API users. Move to deepseek-v4-flash non-thinking and stop there. It is the literal successor to deepseek-chat, the behavior change already happened in April, and at $0.14/$0.28 per million tokens (verified July 25, 2026) it sits at the budget floor of the frontier market. Our frontier pricing notes cover why open-weights economics keep this floor low.

Reasoning-heavy pipelines. Migrate to V4-Flash with thinking enabled first and measure. Only step up to V4-Pro if quality gates fail. Pro's AA Intelligence Index lead (52 vs 47) is real but costs roughly 3x per token, and the full V4 developer guide covers where the gap shows up in practice.

Teams standardized on the Anthropic SDK or Claude Code. The Anthropic-compatible endpoint means migration can be a two-environment-variable change. The automatic claude-to-deepseek model mapping is convenient, but pin explicit deepseek-v4-pro / deepseek-v4-flash names in production so a mapping change upstream cannot silently swap your model. If you are weighing DeepSeek against staying on Claude entirely, see our Fable 5 vs DeepSeek V4 cost-quality comparison.

Self-hosters. Both models are MIT-licensed with weights on Hugging Face (linked from the official announcement). The API retirement does not touch self-hosted deployments. Flash at 160GB is the realistic target for most setups; Pro's 865GB download is datacenter territory.

When to Do Nothing, and When to Leave#

There is no "stay on the legacy endpoints" option here - July 24 is a hard wall. But there are two honest non-migration paths.

Doing nothing until July is mostly fine, with one asterisk. Because the aliases already route to V4-Flash, swapping names today changes nothing about behavior. If your team is heads-down on something else, scheduling this for mid-July costs you only deadline risk. The asterisk: you should still re-run evals now, because the model behind your existing calls already changed in April whether you migrated or not.

Leaving DeepSeek entirely makes sense in a few specific cases. V4 is text-in, text-out only, per Artificial Analysis - if your roadmap needs vision or document understanding in the same call, this migration window is a natural moment to consolidate on a multimodal provider. The same goes if you depend on FIM completion inside thinking-mode flows (FIM is non-thinking only), or if data-governance requirements rule out the hosted API and you cannot run 160GB of weights yourself. Those are real reasons. Per-token price is not one of them, because nothing hosted is meaningfully cheaper right now.

If you are coming from even older code - R1-era patterns and V3 assumptions - our earlier DeepSeek R1 and V3 guide is a useful snapshot of how much the platform has shifted under the same base URL.

FAQ#

Is deepseek-chat deprecated?#

Yes. The legacy model names deepseek-chat and deepseek-reasoner were fully retired on July 24, 2026 at 15:59 UTC. Requests using those names now return errors. The replacement is deepseek-v4-flash (or deepseek-v4-pro for the higher tier).

What replaces deepseek-reasoner after the July 24 deprecation?#

DeepSeek-V4-Flash with thinking mode enabled. Instead of selecting a separate reasoning model, you send model: "deepseek-v4-flash" with thinking: {"type": "enabled"} in the request body. A reasoning_effort parameter tunes thinking depth, and V4-Pro offers the same dual modes at a higher quality tier.

Will my API calls break now that the deadline passed?#

If your code still uses the legacy model names, yes - requests will fail with errors. The base URL, API key, and OpenAI-compatible request format all stay the same, so the fix is updating the model name string and, for reasoner traffic, adding the thinking parameter. This is a one-line change per request.

How much does DeepSeek V4 cost compared to other frontier APIs?#

Verified July 25, 2026 on the official pricing page: V4-Flash is $0.14 input / $0.28 output per million tokens ($0.0028 on cache hits), and V4-Pro is $0.435 / $0.87 ($0.003625 on cache hits). Both include a 1M-token context window and up to 384K output tokens. Prices are unchanged since the June launch across all tiers.

Can I use DeepSeek V4 with the Anthropic SDK?#

Yes. DeepSeek serves an Anthropic-compatible API at https://api.deepseek.com/anthropic. Set ANTHROPIC_BASE_URL to that endpoint and put your DeepSeek key in ANTHROPIC_API_KEY. Claude model names map automatically (opus-class to V4-Pro, sonnet and haiku-class to V4-Flash), though images, documents, and MCP content types are not supported through the bridge.

Official Sources#

Source	Link	Type	Verified
DeepSeek API Pricing & Deprecation	https://api-docs.deepseek.com/quick_start/pricing	Official Docs	July 25, 2026
DeepSeek V4 Release Announcement	https://api-docs.deepseek.com/news/news260424	Official Docs	July 25, 2026
DeepSeek API Quick Start	https://api-docs.deepseek.com/	Official Docs	July 25, 2026
DeepSeek Anthropic-Compatible API	https://api-docs.deepseek.com/guides/anthropic_api	Official Docs	July 25, 2026
DeepSeek Reasoning Model Guide	https://api-docs.deepseek.com/guides/reasoning_model	Official Docs	July 25, 2026
DeepSeek-V4-Flash on Hugging Face	https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash	Open Weights	July 25, 2026
DeepSeek-V4-Pro on Hugging Face	https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro	Open Weights	July 25, 2026
Artificial Analysis: DeepSeek V4 Benchmarks	https://artificialanalysis.ai/articles/deepseek-is-back-among-the-leading-open-weights-models-with-v4-pro-and-v4-flash	Third-Party	July 25, 2026
DataCamp: DeepSeek V4 Overview	https://www.datacamp.com/blog/deepseek-v4	Third-Party	July 25, 2026

Continue Reading#

Frontier Model API Pricing (July 2026) - live pricing snapshot across all major providers
DeepSeek V4 Developer Guide - deep dive into V4 capabilities and API usage
DeepSeek V4 for Budget Coding Agents - cost-optimized patterns for agentic workloads
Fable 5 vs DeepSeek V4 Cost-Quality Comparison - head-to-head benchmark analysis

Last updated: July 25, 2026

What Changed on July 25#

What Actually Happens on July 24#

That means two separate facts worth internalizing:

Your model already changed. If you noticed output drift since late April, this routing is why. Any evals you ran against the old V3-era models no longer describe what you are serving.
Your code will hard-break on July 24. After 15:59 UTC, requests using the legacy names become errors, not redirects.

What Maps to What#

The mapping is mechanical for Flash, with one structural change: thinking is no longer a separate model name.

You have today	It routes to	What to write instead
`deepseek-chat`	DeepSeek-V4-Flash, non-thinking mode	`model: "deepseek-v4-flash"`
`deepseek-reasoner`	DeepSeek-V4-Flash, thinking mode	`model: "deepseek-v4-flash"` plus `thinking: {"type": "enabled"}`
(no legacy equivalent)	DeepSeek-V4-Pro, either mode	`model: "deepseek-v4-pro"`, thinking optional

Note what is new here: V4-Pro had no legacy alias at all. The retirement only forces you onto V4-Flash equivalents. Moving up to Pro is a deliberate choice, which is what the next section is for.

V4-Flash vs V4-Pro: Head-to-Head#

	DeepSeek-V4-Flash	DeepSeek-V4-Pro
Parameters	284B total / 13B active	1.6T total / 49B active
Open-weights download	160GB	865GB
Context window	1M tokens	1M tokens
Max output	384K tokens	384K tokens
Input, cache miss	$0.14 / MTok	$0.435 / MTok
Input, cache hit	$0.0028 / MTok	$0.003625 / MTok
Output	$0.28 / MTok	$0.87 / MTok
Concurrency limit	2,500 requests	500 requests
AA Intelligence Index	47	52
GDPval-AA (agentic)	1,388	1,554
Cost to run full AA Index	$113	$1,071
License	MIT	MIT

Both models support thinking and non-thinking modes, JSON output, tool calls, and chat prefix completion (beta). FIM completion works in non-thinking mode only.

From the archive

Fable 5 with 1M Context: What Actually Works in Practice

Jun 11, 2026 • 10 min read

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

Jun 11, 2026 • 10 min read

Handling Long-Running Fable 5 Requests: Timeouts, Streaming, and Background Patterns

Jun 11, 2026 • 8 min read

Setting Up the Memory Tool with Fable 5: Persistent Agents That Learn

Jun 11, 2026 • 8 min read