Codex /goal and Claude Managed Outcomes: The New Control Loops

Q: What is the difference between Codex /goal and Claude managed outcomes?

Codex `/goal` is a **runtime control** feature built around persisted workflows, runtime continuation, and TUI controls for creating, pausing, resuming, and clearing goal state. Claude managed outcomes is a **quality control** feature that uses explicit rubrics to grade whether work meets acceptance criteria before stopping. Use `/goal` for persistent execution, outcomes for measurable deliverables.

Q: When should I use Codex /goal instead of Claude outcomes?

Use Codex `/goal` when your task is terminal-native development work like migrations, test fixes, or repo refactoring where the primary need is durable continuation through repo edits and shell-driven repair cycles. If your task needs an explicit acceptance rubric or audit trail, use Claude outcomes instead.

Q: How do I migrate from Codex loops to Claude managed outcomes?

Start by testing `/goal` in a staging workspace to measure iteration count, interruption frequency, and budget exhaustion. Add manual checkpoint artifacts after each loop. When adding managed-agent workloads, define rubric templates as versioned files: one minimal safety/format rubric and one full business quality rubric. Emit outcome IDs and evaluation summaries to your telemetry store.

Q: What are the limitations of Codex /goal?

The main limitation is that `/goal` optimizes for forward motion, not output quality. Without explicit acceptance criteria, it can complete work that passes tests but misses quality nuance. It also has token and continuation limits that may halt complex tasks. For quality-critical workflows, pair `/goal` with a rubric-based quality check or use Claude outcomes.

There are two similar sounding directions to make long-running agents less flaky.

OpenAI's Codex CLI added /goal in version 0.128.0.
Anthropic introduced outcomes for Claude Managed Agents as a research preview.

They are both about keeping the loop going until quality is actually acceptable, but they solve it at different layers.

If you are here to choose a workflow, the short answer is:

Use Codex /goal when you want a coding agent to keep making progress inside a terminal session, especially across repo edits, tests, retries, and interruptions.
Use Claude outcomes when the output needs an explicit acceptance rubric, review trail, and measurable "done" state.
Use both when the work is long-running and high-stakes: Codex-style goal persistence for execution, then outcome-style rubrics for final quality checks.

For adjacent decisions, read the broader Codex vs Claude Code comparison, the Claude Code vs Codex side-by-side page, the April Codex changelog analysis, and the AI agent frameworks guide. If you are asking whether Codex can handle tasks beyond code, read Codex as a general-purpose AI agent. If cost is the deciding factor, start with the pricing hub.

What changed with `/goal`

Codex's own changelog says 0.128.0 added persisted /goal workflows with app-server APIs, model tools, runtime continuation, and TUI controls to create, pause, resume, and clear goals (OpenAI Codex changelog).

That sounds simple in a headline, but the interesting part is the implementation shape:

It is in the command loop itself, not in your prompt alone.
A goal is durable across restarts.
The TUI can control the cycle (create, pause, resume, clear) and the CLI can continue work without you typing follow-up prompts each turn.
The release note implies model/tool and UI surfaces were added together, which usually means this is productized as a command-state feature, not just a clever instruction hack.

The older pattern was: send a goal, let the model act a bit, stop, send next command. /goal is trying to invert that pattern so it keeps iterating in one execution envelope until stop criteria are met.

Why this matters operationally

The old problem is usually one of loop boundary leakage:

You ask for a non-trivial task.
The agent does multiple shell/tool steps.
Either it runs out of budget or gets into a suboptimal partial path.
You do not have a clean way to continue from state without repeating context.

A persisted goal narrows this by formalizing loop continuation and reducing "human re-entry overhead."

Where Codex `/goal` likely shines

From the release context and existing Codex command model:

Terminal-native development workflows: if you are doing hands-on repo work, compile checks, and iterative shell-driven repair, command-level persistence is a direct fit.
Plan-mode and interruption semantics: the same release also references plan-mode nudges and /statusline//title editing during active turns, which points to a richer TUI-centered workflow control plane.
Feature-flagged evolution: the 0.128.0 bullet list reads like staged rollout and feature gating. In practice, that is good for enterprise operators who want controlled enabling.

I read that as: /goal is primarily a tooling-loop enhancement around coding agent endurance.

Claude outcomes: rubric-driven task closure

Claude managed agents exposes outcomes as research preview, where you define what "done" looks like and the system works toward that target with a grader loop.

The managed agents documentation says outcomes let you tell the agent what "done" looks like, then evaluate per-criterion grading in a separate context window until the outcome is satisfied or max iterations are hit (Claude managed agents outcomes).

Key details in that page:

Outcomes are explicitly "Research Preview" and require the managed-agents preview beta header when used with the API.
A rubric is required, as markdown text or uploaded file.
The grader returns structured results per criterion and emits explicit outcome status (satisfied, needs_revision, max_iterations_reached, failed).
You can chain outcomes one after another in a session.

This is not just "keep looping." It is close-loop evaluation with explicit quality criteria.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

DeepSeek V4 Changes the Coding Agent Cost Equation

May 2, 2026 • 8 min read

Flue and the Agent Harness Layer

May 2, 2026 • 8 min read

GitHub Copilot Coding Agent and CLI: Why GitHub Is Back in the Agent Race

May 2, 2026 • 8 min read

jcode and the Coding Agent Harness Wars

May 2, 2026 • 8 min read

Real comparison: control primitives

Let's compare from a design perspective.

1) What is the stopping rule?

/goal (Codex): runtime/command-oriented termination via agent loop and manual controls (pause, clear, budget limits in feature context). It sounds like loop completion is driven by model judgment plus command state.
outcome (Claude): outcome status is externally graded against rubric criteria in separate context. That makes termination a function of measured rubric satisfaction.

2) Where does quality live?

/goal quality is implicit, shaped by your prompt and agent context.
outcomes quality is explicit, shaped by rubric design and evaluator output.

3) Operational friction

/goal integrates with existing Codex sessions and CLI continuity (especially useful when you already live inside terminal loop).
outcomes integrates with managed-agent sessions and Files API event stream, and uses event telemetry (span.outcome_evaluation_*) that is useful for observability and audit.

4) Infrastructure complexity

/goal is a command feature and likely lighter to adopt if you already standardize around Codex in a repo.
outcomes demands rubric infrastructure and managed-agent API headers, but gives better reporting for quality-critical workflows.

Novel examples that reveal the difference

Example 1: Large-scale migration with build validation

Use /goal when: you need persistent CLI execution with many shell passes.

Goal text: "Migrate all API v1 clients to v2 and keep tests green."
Agent keeps running: search + patch + run tests + patch again.
Human intervention points: pause, status, final diff.

This is the right shape when your objective is operational execution and tool orchestration speed.

Example 2: Financial model generation from SEC filings

Use outcomes when: you need objective quality checks.

Rubric includes explicit data source, assumption statement fields, forecast horizon, and file structure.
Agent writes output artifacts and grader checks each criterion.
Failure gives exact rubric gaps to revise.

This is the right shape when acceptance is judgment-heavy and you need repeatability.

Example 3: Product support playbook from a messy codebase

Hybrid approach:

goal: first pass that extracts stack trace clusters and prepares candidate fixes.
outcome: second pass with rubric requiring reproduction steps, regression test, and evidence artifact links.

This gives the endurance of /goal plus rubric-level correctness from outcomes.

Common mistakes

Treating goal as output quality control

/goal is excellent for keeping work moving, but without explicit criteria it can optimize for forward motion over quality nuance.

Treating outcomes as "just autopilot"

Outcomes still depend on rubric design. Bad rubric = bad stopping decision.

Ignoring token budgets / iteration caps

Codex has token/continuation limits in feature work; outcomes has max iterations and explicit failed/max_iterations_reached result states.

Not version-gating

Outcomes is explicitly research preview. Plan for fallback runbooks.

Practical migration map

If you are currently on Codex-only loops, start with:

Enable and test /goal in staging workspace.
Measure average iterations, interruption frequency, and budget exhaustion events.
Add manual checkpoint artifacts after each successful loop.

If you then add managed-agent workloads:

Define two rubric templates: a minimal "safety/format" rubric and a full "business quality" rubric.
Prefer rubric templates as versioned files in a session-level directory.
Emit outcome IDs and evaluation summaries to your telemetry store.

If you are choosing where to start right now:

Need immediate coding loop resilience in terminal sessions? /goal.
Need auditable deliverable quality in autonomous tasks? outcomes.
Need a broader tool choice first? Start with AI tool comparisons, AI coding tools pricing, or Claude Code vs Codex.

So what's the real difference?

This is the sharp distinction:

Codex /goal = loop state as runtime control.
Claude outcomes = loop state as quality control contract.

They are converging, but they are not redundant yet.

For teams building production automations, the highest-leverage stack is often both:

Use /goal for "keep going and recover from interruption."
Use outcomes when handoff quality must be measurable and rubric-traceable.

Sources

OpenAI Codex changelog entry for 0.128.0 (persisted /goal workflows and related items): https://developers.openai.com/codex/changelog
OpenAI Codex docs hub: https://developers.openai.com/codex/
Codex CLI slash commands overview: https://developers.openai.com/codex/cli/slash-commands
OpenAI Codex pricing: https://developers.openai.com/codex/pricing
Mintlify Codex slash command listing for command context: https://www.mintlify.com/openai/codex/features/slash-commands
Claude Managed Agents launch post: https://www.anthropic.com/news/claude-managed-agents
Claude managed agents define outcomes (research preview): https://platform.claude.com/docs/en/managed-agents/define-outcomes
Claude API pricing: https://platform.claude.com/docs/en/about-claude/pricing

FAQ

What is the difference between Codex /goal and Claude managed outcomes?

Codex /goal is a runtime control feature built around persisted workflows, runtime continuation, and TUI controls for creating, pausing, resuming, and clearing goal state. Claude managed outcomes is a quality control feature that uses explicit rubrics to grade whether work meets acceptance criteria before stopping. Use /goal for persistent execution, outcomes for measurable deliverables.

When should I use Codex /goal instead of Claude outcomes?

Use Codex /goal when your task is terminal-native development work like migrations, test fixes, or repo refactoring where the primary need is durable continuation through repo edits and shell-driven repair cycles. If your task needs an explicit acceptance rubric or audit trail, use Claude outcomes instead.

Can I use Codex /goal and Claude outcomes together?

Yes, a hybrid approach is often the best choice for long-running, high-stakes work. Use Codex /goal for the execution phase where the agent needs to keep making progress across shell commands and test cycles. Then use Claude outcomes as a final quality gate with explicit rubric criteria. This gives you both execution endurance and measurable correctness.

How do I migrate from Codex loops to Claude managed outcomes?

Start by testing /goal in a staging workspace to measure iteration count, interruption frequency, and budget exhaustion. Add manual checkpoint artifacts after each loop. When adding managed-agent workloads, define rubric templates as versioned files: one minimal safety/format rubric and one full business quality rubric. Emit outcome IDs and evaluation summaries to your telemetry store.

What are the limitations of Codex /goal?

The main limitation is that /goal optimizes for forward motion, not output quality. Without explicit acceptance criteria, it can complete work that passes tests but misses quality nuance. It also has token and continuation limits that may halt complex tasks. For quality-critical workflows, pair /goal with a rubric-based quality check or use Claude outcomes.

What are the limitations of Claude managed outcomes?

Claude managed outcomes depends entirely on rubric design - a bad rubric leads to bad stopping decisions. It is also marked as research preview, so you should plan for fallback runbooks. The managed-agent API requires preview beta headers and has max iteration limits. When the grader returns max_iterations_reached or failed, you need a recovery path.

Which is better for production automations?

For production, the highest-leverage stack is often both: use Codex /goal for "keep going and recover from interruption" during execution, then use Claude outcomes when handoff quality must be measurable and rubric-traceable. This combines the execution resilience of /goal with the quality assurance of rubric-graded outcomes.

How do Codex /goal and Claude outcomes compare on cost?

The public docs do not provide a clean apples-to-apples price formula. Treat Codex /goal cost as the underlying Codex session and model usage, then check OpenAI Codex pricing before estimating. Treat Claude outcomes as managed-agent usage plus outcome evaluation usage, then check Claude API pricing. For budget-sensitive work, start with the AI coding tools pricing comparison and the pricing hub.