
TL;DR
A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
Direct answer
A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, automations, and repeatable knowledge work.
8 min readA practical architecture for multi-step Claude agents. Loop patterns, state management, error recovery, and the production gotchas that turn a five-step demo into a 20 percent success rate at scale.
11 min readA long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Google ADK, LangChain, Deep Agents, and CrewAI, plus practical production patterns.
24 min readThere are two similar sounding directions to make long-running agents less flaky.
/goal in version 0.128.0.They are both about keeping the loop going until quality is actually acceptable, but they solve it at different layers.
If you are here to choose a workflow, the short answer is:
/goal when you want a coding agent to keep making progress inside a terminal session, especially across repo edits, tests, retries, and interruptions.For adjacent decisions, read the broader Codex vs Claude Code comparison, the Claude Code vs Codex side-by-side page, the April Codex changelog analysis, and the AI agent frameworks guide. If you are asking whether Codex can handle tasks beyond code, read Codex as a general-purpose AI agent. If cost is the deciding factor, start with the pricing hub.
/goalCodex's own changelog says 0.128.0 added persisted /goal workflows with app-server APIs, model tools, runtime continuation, and TUI controls to create, pause, resume, and clear goals (OpenAI Codex changelog).
That sounds simple in a headline, but the interesting part is the implementation shape:
create, pause, resume, clear) and the CLI can continue work without you typing follow-up prompts each turn.The older pattern was: send a goal, let the model act a bit, stop, send next command. /goal is trying to invert that pattern so it keeps iterating in one execution envelope until stop criteria are met.
The old problem is usually one of loop boundary leakage:
A persisted goal narrows this by formalizing loop continuation and reducing "human re-entry overhead."
/goal likely shinesFrom the release context and existing Codex command model:
/statusline//title editing during active turns, which points to a richer TUI-centered workflow control plane.I read that as: /goal is primarily a tooling-loop enhancement around coding agent endurance.
Claude managed agents exposes outcomes as research preview, where you define what "done" looks like and the system works toward that target with a grader loop.
The managed agents documentation says outcomes let you tell the agent what "done" looks like, then evaluate per-criterion grading in a separate context window until the outcome is satisfied or max iterations are hit (Claude managed agents outcomes).
Key details in that page:
satisfied, needs_revision, max_iterations_reached, failed).This is not just "keep looping." It is close-loop evaluation with explicit quality criteria.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Let's compare from a design perspective.
/goal (Codex): runtime/command-oriented termination via agent loop and manual controls (pause, clear, budget limits in feature context). It sounds like loop completion is driven by model judgment plus command state.outcome (Claude): outcome status is externally graded against rubric criteria in separate context. That makes termination a function of measured rubric satisfaction./goal quality is implicit, shaped by your prompt and agent context./goal integrates with existing Codex sessions and CLI continuity (especially useful when you already live inside terminal loop).span.outcome_evaluation_*) that is useful for observability and audit./goal is a command feature and likely lighter to adopt if you already standardize around Codex in a repo.Use /goal when: you need persistent CLI execution with many shell passes.
pause, status, final diff.This is the right shape when your objective is operational execution and tool orchestration speed.
Use outcomes when: you need objective quality checks.
This is the right shape when acceptance is judgment-heavy and you need repeatability.
Hybrid approach:
goal: first pass that extracts stack trace clusters and prepares candidate fixes.outcome: second pass with rubric requiring reproduction steps, regression test, and evidence artifact links.This gives the endurance of /goal plus rubric-level correctness from outcomes.
goal as output quality control/goal is excellent for keeping work moving, but without explicit criteria it can optimize for forward motion over quality nuance.
Outcomes still depend on rubric design. Bad rubric = bad stopping decision.
Codex has token/continuation limits in feature work; outcomes has max iterations and explicit failed/max_iterations_reached result states.
Outcomes is explicitly research preview. Plan for fallback runbooks.
If you are currently on Codex-only loops, start with:
/goal in staging workspace.If you then add managed-agent workloads:
If you are choosing where to start right now:
/goal.This is the sharp distinction:
/goal = loop state as runtime control.They are converging, but they are not redundant yet.
For teams building production automations, the highest-leverage stack is often both:
/goal for "keep going and recover from interruption."/goal workflows and related items): https://developers.openai.com/codex/changelogCodex /goal is a runtime control feature built around persisted workflows, runtime continuation, and TUI controls for creating, pausing, resuming, and clearing goal state. Claude managed outcomes is a quality control feature that uses explicit rubrics to grade whether work meets acceptance criteria before stopping. Use /goal for persistent execution, outcomes for measurable deliverables.
Use Codex /goal when your task is terminal-native development work like migrations, test fixes, or repo refactoring where the primary need is durable continuation through repo edits and shell-driven repair cycles. If your task needs an explicit acceptance rubric or audit trail, use Claude outcomes instead.
Yes, a hybrid approach is often the best choice for long-running, high-stakes work. Use Codex /goal for the execution phase where the agent needs to keep making progress across shell commands and test cycles. Then use Claude outcomes as a final quality gate with explicit rubric criteria. This gives you both execution endurance and measurable correctness.
Start by testing /goal in a staging workspace to measure iteration count, interruption frequency, and budget exhaustion. Add manual checkpoint artifacts after each loop. When adding managed-agent workloads, define rubric templates as versioned files: one minimal safety/format rubric and one full business quality rubric. Emit outcome IDs and evaluation summaries to your telemetry store.
The main limitation is that /goal optimizes for forward motion, not output quality. Without explicit acceptance criteria, it can complete work that passes tests but misses quality nuance. It also has token and continuation limits that may halt complex tasks. For quality-critical workflows, pair /goal with a rubric-based quality check or use Claude outcomes.
Claude managed outcomes depends entirely on rubric design - a bad rubric leads to bad stopping decisions. It is also marked as research preview, so you should plan for fallback runbooks. The managed-agent API requires preview beta headers and has max iteration limits. When the grader returns max_iterations_reached or failed, you need a recovery path.
For production, the highest-leverage stack is often both: use Codex /goal for "keep going and recover from interruption" during execution, then use Claude outcomes when handoff quality must be measurable and rubric-traceable. This combines the execution resilience of /goal with the quality assurance of rubric-graded outcomes.
The public docs do not provide a clean apples-to-apples price formula. Treat Codex /goal cost as the underlying Codex session and model usage, then check OpenAI Codex pricing before estimating. Treat Claude outcomes as managed-agent usage plus outcome evaluation usage, then check Claude API pricing. For budget-sensitive work, start with the AI coding tools pricing comparison and the pricing hub.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...
View ToolMulti-agent orchestration framework built on the OpenAI Agents SDK. Define agent roles, typed tools, and directional com...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolOpenAI's coding agent for terminal, cloud, IDE, GitHub, Slack, and Linear workflows. Reads repos, edits files, runs comm...
View ToolOne control panel for Claude Code, Codex, Gemini, Cursor, and 10+ AI coding harnesses. Desktop app for Mac.
Open AppBuild, test, and iterate agent skills from the terminal. Create Claude Code skills with interview or one-liner.
Open AppPremium tier for the Skills marketplace. Unlock pro skills, private collections, and team sharing.
Open AppAdmin-controlled allow and deny lists for MCP servers.
Claude CodeA complete, citation-backed Claude Code course with setup, prompting systems, MCP, CI, security, cost controls, and capstone workflows.
ai-developmentLow, medium, high, xhigh, and max for adaptive reasoning control.
Claude Code
Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Claude Code “Loop” Scheduling: Recurring AI Tasks in Your Session The script explains Claude Code’s new “Loop” feature (an evolution of the Ralph Wiggins technique) for running recurring prompts that...

Claude Design by Anthropic: Generate a Design System From Your Repo + Build High-Fidelity UI Fast The video reviews Claude Design by Anthropic, calling it a highly differentiated product, and demonst...

OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, auto...

A practical architecture for multi-step Claude agents. Loop patterns, state management, error recovery, and the producti...

A long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Goo...

OpenAI released their Agents SDK for TypeScript with first-class support for tool calling, structured outputs, multi-age...

Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries....

Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.