TL;DR
deepseek-chat is deprecated and disappears July 24, 2026 - here is how to migrate to V4 Flash or Pro, with verified pricing, thinking-mode mapping, and a step-by-step checklist.
Read next
DeepSeek V4 splits into Flash and Pro, ships a 1M context window, and undercuts every closed model on price. Here's how to wire it up with the OpenAI SDK, when to pick it over Claude or GPT, and what changed since V3 and R1.
10 min readDeepSeek's R1 and V3 models deliver frontier-level performance under an MIT license. Here's how to use them through the API, run them locally with Ollama, and decide when they beat closed-source alternatives.
9 min readA hands-on look at Mastra, the open source TypeScript framework for building production-ready AI agents and workflows -- with verified setup commands, honest tradeoffs, and current pricing.
8 min readLast updated: June 11, 2026
If you have deepseek-chat or deepseek-reasoner anywhere in your codebase, you have a hard deadline. DeepSeek's V4 announcement states that both legacy model names "will be fully retired and inaccessible after Jul 24th, 2026, 15:59 (UTC Time)." The official pricing page confirms the same cutoff: deprecation lands at 2026/07/24 15:59 UTC.
The good news: this is one of the gentler migrations in recent memory. The base URL does not change, both legacy names already route to the new model, and the replacement is cheaper per token than almost anything else on the market. The catch is that thinking mode moved from a model name to a request parameter, and a few legacy deepseek-reasoner quirks need cleanup on the way through. This guide is the what-maps-to-what checklist.
Since the V4 preview shipped on April 24, 2026, the legacy names have been aliases. Per the official announcement, deepseek-chat and deepseek-reasoner are "currently routing to deepseek-v4-flash non-thinking/thinking" respectively. So if your app still sends the old names, you have already been on V4-Flash for weeks - the July 24 date is when the aliases stop resolving entirely.
That means two separate facts worth internalizing:
DeepSeek's own migration advice is one line: "Keep base_url, just update model to deepseek-v4-pro or deepseek-v4-flash." The endpoint stays https://api.deepseek.com, and the API remains compatible with the OpenAI ChatCompletions format, with an Anthropic-compatible surface as a second option (more on that below).
The mapping is mechanical for Flash, with one structural change: thinking is no longer a separate model name.
| You have today | It routes to | What to write instead |
|---|---|---|
deepseek-chat | DeepSeek-V4-Flash, non-thinking mode | model: "deepseek-v4-flash" |
deepseek-reasoner | DeepSeek-V4-Flash, thinking mode | model: "deepseek-v4-flash" plus thinking: {"type": "enabled"} |
| (no legacy equivalent) | DeepSeek-V4-Pro, either mode | model: "deepseek-v4-pro", thinking optional |
Per the API quick start, thinking mode on V4 models is controlled by a request parameter, not a model suffix. With the OpenAI Python SDK that looks like extra_body={"thinking": {"type": "enabled"}}; in the Node.js SDK, thinking passes as a direct parameter. The docs also show a reasoning_effort parameter (for example "high") for tuning how hard the model thinks.
Note what is new here: V4-Pro had no legacy alias at all. The retirement only forces you onto V4-Flash equivalents. Moving up to Pro is a deliberate choice, which is what the next section is for.
All pricing below is from the official pricing page, verified June 11, 2026. Benchmark figures are from Artificial Analysis and DataCamp, accessed June 11, 2026. Open-weights download sizes are from the DeepSeek-V4-Flash and DeepSeek-V4-Pro Hugging Face repositories.
| DeepSeek-V4-Flash | DeepSeek-V4-Pro | |
|---|---|---|
| Parameters | 284B total / 13B active | 1.6T total / 49B active |
| Open-weights download | 160GB | 865GB |
| Context window | 1M tokens | 1M tokens |
| Max output | 384K tokens | 384K tokens |
| Input, cache miss | $0.14 / MTok | $0.435 / MTok |
| Input, cache hit | $0.0028 / MTok | $0.003625 / MTok |
| Output | $0.28 / MTok | $0.87 / MTok |
| Concurrency limit | 2,500 requests | 500 requests |
| AA Intelligence Index | 47 | 52 |
| GDPval-AA (agentic) | 1,388 | 1,554 |
| Cost to run full AA Index | $113 | $1,071 |
| License | MIT | MIT |
Both models support thinking and non-thinking modes, JSON output, tool calls, and chat prefix completion (beta). FIM completion works in non-thinking mode only.
One number worth dwelling on: the 1M-token context window is standard on both models, which Artificial Analysis notes is an 8x expansion from V3.2's 128K. If you built chunking or context-compression layers to live inside the old window, the migration is a chance to delete code.
A pricing caveat, because honesty beats tidiness: launch-window coverage from Artificial Analysis and DataCamp listed V4-Pro at $1.74 input / $3.48 output per million tokens. The live pricing page as verified on June 11, 2026 lists the lower $0.435 / $0.87 figures above. Treat the official page as canonical and re-check it before you commit a budget, since the numbers have clearly moved since April.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 10 min read
Jun 11, 2026 • 8 min read
Jun 11, 2026 • 8 min read
Work through these in order. For most codebases this is an afternoon, not a sprint.
Grep for both names across code, config, env files, and infrastructure-as-code. The sneaky ones live in evaluation harnesses, fallback chains, and notebooks.
deepseek-chat becomes deepseek-v4-flash. deepseek-reasoner becomes deepseek-v4-flash with thinking: {"type": "enabled"} in the request body. Base URL stays https://api.deepseek.com.
The reasoning model guide documents that the chain of thought comes back in a separate reasoning_content field and must be removed from assistant messages before you send the conversation back - including it triggers errors. If your deepseek-reasoner integration already handles this, keep that code. If you are adding thinking mode for the first time, build this in from day one.
Legacy deepseek-reasoner defaulted to 32K output with a 64K maximum, including the chain of thought, per the reasoning model guide. V4 models support up to 384K output tokens per the pricing page. If you hardcoded conservative limits to dodge truncation, you have new headroom - but remember output is the expensive direction, so raise limits deliberately.
Legacy deepseek-reasoner did not support function calling at all. On V4, the pricing page lists tool call support on both models. If you previously split traffic between a reasoning model and a tool-calling model to work around this, you can likely collapse that into a single V4-Flash call with thinking enabled. That is a genuine architectural simplification, so re-run your agent evals to confirm.
The cache-hit discount on V4 is dramatic: $0.0028 versus $0.14 per million input tokens on Flash, verified June 11, 2026 on the pricing page. Stable system prompts and shared context prefixes pay for themselves at this ratio. We dug into cache-first agent design in DeepSeek V4 for budget coding agents if you want patterns.
V4-Flash allows 2,500 concurrent requests; V4-Pro allows 500. If you are migrating a high-throughput pipeline and considering the Pro upgrade at the same time, the 5x concurrency difference may matter more than the per-token price difference.
DeepSeek runs a second API surface at https://api.deepseek.com/anthropic that speaks the Anthropic Messages format, per the Anthropic API guide. You point the Anthropic SDK at it with ANTHROPIC_BASE_URL and your DeepSeek key in ANTHROPIC_API_KEY. Model name mapping is automatic: claude-opus variants map to deepseek-v4-pro, claude-sonnet and claude-haiku variants map to deepseek-v4-flash, and anything unmapped defaults to V4-Flash. Streaming, temperature, stop sequences, top_p, tool use, web search result content, and thinking mode are supported, though budget_tokens is ignored. Images, documents, code execution results, and MCP content types are not supported through this bridge.
High-volume, cost-sensitive API users. Move to deepseek-v4-flash non-thinking and stop there. It is the literal successor to deepseek-chat, the behavior change already happened in April, and at $0.14/$0.28 per million tokens (verified June 11, 2026) it sits at the budget floor of the frontier market. Our frontier pricing notes cover why open-weights economics keep this floor low.
Reasoning-heavy pipelines. Migrate to V4-Flash with thinking enabled first and measure. Only step up to V4-Pro if quality gates fail. Pro's AA Intelligence Index lead (52 vs 47) is real but costs roughly 3x per token, and the full V4 developer guide covers where the gap shows up in practice.
Teams standardized on the Anthropic SDK or Claude Code. The Anthropic-compatible endpoint means migration can be a two-environment-variable change. The automatic claude-to-deepseek model mapping is convenient, but pin explicit deepseek-v4-pro / deepseek-v4-flash names in production so a mapping change upstream cannot silently swap your model. If you are weighing DeepSeek against staying on Claude entirely, see our Fable 5 vs DeepSeek V4 cost-quality comparison.
Self-hosters. Both models are MIT-licensed with weights on Hugging Face (linked from the official announcement). The API retirement does not touch self-hosted deployments. Flash at 160GB is the realistic target for most setups; Pro's 865GB download is datacenter territory.
There is no "stay on the legacy endpoints" option here - July 24 is a hard wall. But there are two honest non-migration paths.
Doing nothing until July is mostly fine, with one asterisk. Because the aliases already route to V4-Flash, swapping names today changes nothing about behavior. If your team is heads-down on something else, scheduling this for mid-July costs you only deadline risk. The asterisk: you should still re-run evals now, because the model behind your existing calls already changed in April whether you migrated or not.
Leaving DeepSeek entirely makes sense in a few specific cases. V4 is text-in, text-out only, per Artificial Analysis - if your roadmap needs vision or document understanding in the same call, this migration window is a natural moment to consolidate on a multimodal provider. The same goes if you depend on FIM completion inside thinking-mode flows (FIM is non-thinking only), or if data-governance requirements rule out the hosted API and you cannot run 160GB of weights yourself. Those are real reasons. Per-token price is not one of them, because nothing hosted is meaningfully cheaper right now.
If you are coming from even older code - R1-era patterns and V3 assumptions - our earlier DeepSeek R1 and V3 guide is a useful snapshot of how much the platform has shifted under the same base URL.
Yes. The official pricing page states the model names deepseek-chat and deepseek-reasoner will be deprecated on 2026/07/24 at 15:59 UTC, and the V4 announcement says they become fully inaccessible after that time. Both names currently route to DeepSeek-V4-Flash, so the underlying model already changed in April 2026.
DeepSeek-V4-Flash with thinking mode enabled. Instead of selecting a separate reasoning model, you send model: "deepseek-v4-flash" with thinking: {"type": "enabled"} in the request body. A reasoning_effort parameter tunes thinking depth, and V4-Pro offers the same dual modes at a higher quality tier.
If they still use the legacy model names, yes - requests will fail after 15:59 UTC on July 24, 2026. The base URL, API key, and OpenAI-compatible request format all stay the same, so the fix is updating the model name string and, for reasoner traffic, adding the thinking parameter.
Verified June 11, 2026 on the official pricing page: V4-Flash is $0.14 input / $0.28 output per million tokens ($0.0028 on cache hits), and V4-Pro is $0.435 / $0.87 ($0.003625 on cache hits). Both include a 1M-token context window and up to 384K output tokens. That places Flash at the cheapest end of the current frontier-adjacent market by a wide margin.
Yes. DeepSeek serves an Anthropic-compatible API at https://api.deepseek.com/anthropic. Set ANTHROPIC_BASE_URL to that endpoint and put your DeepSeek key in ANTHROPIC_API_KEY. Claude model names map automatically (opus-class to V4-Pro, sonnet and haiku-class to V4-Flash), though images, documents, and MCP content types are not supported through the bridge.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Open-source terminal agent runtime with approval modes, rollback snapshots, MCP servers, LSP diagnostics, and a headless...
View ToolOpen-source reasoning models from China. DeepSeek-R1 rivals o1 on math and code benchmarks. V3 for general use. Fully op...
View ToolDeepSeek's reasoning-first model built for agents. First model to integrate thinking directly into tool use. Ships along...
View ToolFastest inference for open-source models. 200+ models via unified API. Ranks #1 on speed benchmarks for DeepSeek, Qwen,...
View ToolTrack open-source maintenance signals, release tasks, and repo follow-ups in one dashboard.
View AppSchema browser, migration planner, RLS auditor, and SQL notebook for Postgres. Built for Neon, Supabase, or bare Postgres.
View AppBeat the August 2026 Assistants API sunset. Paste old code, get Responses API.
View AppStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsA concrete step-by-step guide to moving your development workflow from Cursor to Claude Code - settings, rules, keybindings, and the habits that transfer.
Getting StartedInstall the dd CLI and scaffold your first AI-powered app in under a minute.
Getting StartedGPT-5.4 vs Gemini 3.1 Pro vs DeepSeek V4: pricing, benchmarks, context behavior, and license terms for the mid-tier mode...
Google released DiffusionGemma today, a 26B MoE open model that generates entire 256-token blocks in parallel instead of...
DeepSeek V4-Flash costs $0.28 per million output tokens. Fable 5 costs $50. That 178x gap is real - but so is the qualit...
A hands-on look at Mastra, the open source TypeScript framework for building production-ready AI agents and workflows --...
Windsurf is now Devin Desktop, owned by Cognition after a turbulent 2025 acquisition saga. If the ownership shuffle has...
Fable 5 is mostly a drop-in replacement for Opus 4.8, but 'mostly' is doing real work in that sentence. Here's every bre...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.