
TL;DR
OpenAI is sunsetting the Assistants API in 2026. Here is a tested migration plan to the Responses API - code, state, threads, tools, every cliff I hit, in order.
Read next
GPT-5.4 ships state-of-the-art computer use, steerable thinking, and a million-token window. Here is the implementation guide for builders, with real OpenAI SDK code, the 272K pricing cliff, and where it actually beats 5.3 and 5.5 in production.
12 min readGPT-5.5 and 5.5 Pro hit the API on April 24. Here is what changes for builders: pricing, agentic tasks, tool-use, and the real benchmarks I ran the day it dropped.
11 min readDeepSeek V4 splits into Flash and Pro, ships a 1M context window, and undercuts every closed model on price. Here's how to wire it up with the OpenAI SDK, when to pick it over Claude or GPT, and what changed since V3 and R1.
10 min readOpenAI confirmed the Assistants API sunset in the developer changelog: new endpoints frozen now, full shutdown in 2026. Threads, runs, run-steps, and the assistant resource itself all go away. Files and vector stores survive (they moved into the Responses API surface). Function calling survives but the schema is slightly different. The Code Interpreter and File Search tools survive as built-in tools on Responses.
For the design side of the same problem, read OpenAI Codex: Cloud AI Coding With GPT-5.3 with OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; they show how agent-generated interfaces fail and how to give coding agents better visual constraints.
If you are running production code against client.beta.threads.* today, you have homework. I had a 14-month-old Assistants codebase running newsletter automation, customer support triage, and a chunk of internal ops. Last weekend I migrated all of it. This is the field guide - every cliff I hit, in order, with the code diffs that worked.
For the visual walkthrough including the eval harness I used to gate the cutover, see the DevDigest YouTube channel.
The Assistants API was server-stateful. You created a thread, posted messages, kicked off runs, polled for completion, and OpenAI held the conversation history. Your code did not own the state.
The Responses API is client-stateful by default, server-stateful by opt-in. Each call returns a response.id. You pass previous_response_id on the next call to get continuity. The server stores the chain for 30 days. After that, you reconstruct from your own DB or pass the message array explicitly.
This is the right design - server-only state was a footgun for compliance, debugging, and multi-region - but it changes how you think about every conversation:
| Assistants | Responses |
|---|---|
threads.create() | nothing - just call responses.create |
threads.messages.create() | include in input array |
runs.create() + poll | responses.create() returns synchronously or streams |
run.required_action | response.required_action (similar but flatter) |
assistants.create() | prompts + system messages + tools per call |
The big mental shift: there is no assistant object anymore. The "assistant" is your prompt template + tool list + model config, which you supply per call. This is why I version mine in Promptlock - the prompt is now a first-class artifact in your repo, not a row in OpenAI's database.
Here is the minimal-diff before/after for a single conversation turn. The "before" is the standard Assistants pattern most of us wrote in 2024:
// BEFORE - Assistants API
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: "user",
content: userMessage,
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: ASSISTANT_ID,
});
const messages = await client.beta.threads.messages.list(thread.id);
const reply = messages.data[0].content[0].text.value;
// AFTER - Responses API
const response = await client.responses.create({
model: "gpt-5.5",
instructions: SYSTEM_PROMPT,
input: userMessage,
tools: TOOLS,
previous_response_id: priorResponseId, // null on first turn
store: true, // 30-day server retention
});
const reply = response.output_text;
const newResponseId = response.id; // persist for next turn
The "after" version is shorter, synchronous on the happy path, and the conversation chain lives in two places you control: your DB row (the response.id) and your prompt repo (SYSTEM_PROMPT).
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Apr 29, 2026 • 11 min read
Apr 29, 2026 • 10 min read
Apr 29, 2026 • 10 min read
Apr 29, 2026 • 13 min read
This is where I lost the most time. Three patterns I now use:
Pattern 1: Short-lived chains (default). Persist previous_response_id against your conversation row. On each turn, pass it. Trust OpenAI's 30-day retention. This is what most apps want.
await db.conversation.update({
where: { id: convId },
data: { lastResponseId: response.id },
});
Pattern 2: Long-lived or compliance-bound chains. Do not rely on server retention. Store every message in your DB and pass them explicitly:
const response = await client.responses.create({
model: "gpt-5.5",
instructions: SYSTEM_PROMPT,
input: messages.map((m) => ({ role: m.role, content: m.content })),
store: false, // do not retain server-side
});
Pattern 3: Hybrid. Short-lived state via previous_response_id, but you also write every input/output to your DB for replay and eval purposes. This is what I run in production. It is the only pattern that gives you both ergonomic continuity and full-control debugging.
The cliff I hit: I assumed previous_response_id would still work after 31 days. It does not - the server returns a 404. Wrap every call in a fallback that reconstructs from your DB if the chain is missing.
Function calling works, with a flatter schema. The tools array is the same shape. The big differences:
code_interpreter and file_search are now first-class tools you enable per call. No more attaching them to an assistant.requires_action.Here is the parallel-tool gotcha. In Assistants, this code was safe:
// Assistants - implicit serial
for (const call of run.required_action.submit_tool_outputs.tool_calls) {
const output = await runTool(call); // safe, one at a time
}
In Responses, the model now expects you to handle multiple tool calls concurrently. If runTool is not idempotent or hits a rate-limited downstream, batch your calls or Promise.all them with a concurrency cap:
import pLimit from "p-limit";
const limit = pLimit(3);
const outputs = await Promise.all(
response.required_action.submit_tool_outputs.tool_calls.map((call) =>
limit(() => runTool(call))
)
);
I missed this on my first migration. The customer-support agent fired four parallel ticket-update calls to a legacy CRM and got rate-limited into oblivion within an hour.
The migration is mechanical but the behavior is not always identical. Different default temperatures, different tool-call patterns, different message-formatting quirks. I would not cut over without a regression eval.
My harness: a flag-gated rollout where 10% of traffic goes to Responses, 90% to Assistants, both runs are logged with the same input, and a nightly job scores the diffs. I open-sourced the bones of this as Agent Eval Bench - input replay, output diff, automated grading via a stronger model.
The cutover schedule that worked for me:
@deprecated comments for one more month, then delete.Burn-down looked roughly like this in my logs:
Day 1: 47 endpoints calling Assistants
Day 7: 47 (built path, no traffic yet)
Day 9: 47 → 47 (10% rollout, both alive)
Day 14: 47 → 12 (cut the safe ones, kept stateful chains on assistants)
Day 21: 12 → 3 (long-lived chain edge cases)
Day 28: 0
The last three were the long-lived stateful chains where I needed pattern 2 above (explicit history). They took longer because I had to backfill DB writes for conversations that had been server-stateful for months.
Three things in priority order:
runTool implementations for shared mutable state. The parallel-by-default behavior will find every race condition you have.OpenAI gave us through 2026, which sounds generous until you remember every other library you depend on is also moving. Do not be the one team migrating in October.
The Responses API is the better primitive. It is simpler, more honest about state, and the streaming model finally feels native. The migration is a weekend of work for a small codebase and two weeks for a complex one. Worth it.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Factory AI's terminal coding agent. Runs Anthropic and OpenAI models in one subscription. Handles full tasks end-to-end...
View ToolThe TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...
View ToolUnified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolOpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT P...
View ToolConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
AI AgentsSet up Codex Chronicle on macOS, manage permissions, and understand privacy, security, and troubleshooting.
Getting Started
Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries....

OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, auto...

Codex is no longer just a terminal agent. Here is when to use the Codex SDK, Codex CLI, or openai/codex-action, and how...

The latest GPT Image 2 prompt-library repos are not just galleries. They point at a practical workflow for repeatable vi...

Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager...

OpenAI's May 8 macOS certificate rotation for ChatGPT, Codex, Codex CLI, and Atlas is not just a one-off update. It is a...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.