Building and using AI agents - multi-agent systems, autonomous coding, and orchestration.
122 resources - 116 posts, 2 tools, 4 guides
Claude agents vs skills, untangled: agents are workers with their own context window, skills are instructions loaded on demand. Here is the decision table.
Auto mode replaces permission prompts with a background safety classifier - here is how the Shift+Tab cycle, hard_deny rules, and glob deny patterns actually fit together.
Claude Code dynamic workflows turn orchestration into a JavaScript script that runs up to 1,000 agents per run - here is how scripts, schemas, budgets, and resume actually work.
Anthropic says persistent file-based memory improved Fable 5 three times more than it improved Opus 4.8. Here is the full memory tool setup - handlers, security, and context editing included.
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
Task budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.
An ops guide to managing a fleet of Claude agents: spawning patterns, worktree isolation, build gates, orphaned-agent failure modes, and OpenTelemetry monitoring.
Rewriting prompts and skills for Fable 5: what changes when you migrate agents from Opus 4.x, how effort interplay works, and which old workarounds now hurt.
Ultracode is two documented things: a prompt keyword that turns one task into a dynamic workflow, and an /effort setting that pairs xhigh reasoning with automatic orchestration. Here is exactly what the docs say.
Twelve documented Claude Fable 5 use patterns - agent orchestration, overnight runs, 1M-context refactors, effort tuning - each with a how-to seed and doc link.
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
AI SDK 6 ships ToolLoopAgent and full MCP support. LangGraph hits 1.0 GA with durable state and built-in interrupt/resume. Here is how to choose between them for your TypeScript team.
Goose is a Rust-built AI agent with a CLI, desktop app, and API that runs against 15+ LLM providers and extends through 70+ MCP extensions - here is why developers are installing it.

OpenAI's harness engineering post and new token-use research point to the same lesson: agentic coding teams need token budgets, receipts, and eval loops, not vibes.
Headroom is a context compression layer that intercepts your AI agent's tool outputs and strips 60-95% of the tokens before they hit the model - with benchmarked accuracy preserved.

Anthropic's open-source vulnerability harness shows where AI security work is going: reproducible exploit loops, separate verification agents, and patch receipts.

Anthropic's Claude containment writeup points to the next security layer for coding agents: deterministic capability ledgers, not another approval prompt.
Headroom is a context compression layer trending on GitHub that slashes token usage by 60-95% across Claude Code, Cursor, and other AI coding agents - without sacrificing accuracy.

GitHub Trending is full of agent memory and context tools. The useful version is not magic recall. It is a context ledger: source-linked, scoped, expiring memory that agents can inspect and users can audit.

The ChatGPT for Google Sheets exfiltration report is not just a spreadsheet bug. It is a warning about agentic office tools: permissions need to be action-scoped, logged, revocable, and visible.

A huge Hacker News thread says domain expertise is the real moat in agentic coding. The sharper version: tacit judgment only compounds when you turn it into examples, tests, DSLs, and review gates.

Before an AI agent gets tools, files, APIs, MCP servers, or deployment access, decide what it can read, write, call, log, and roll back.

Mastra is the strongest fit when a TypeScript product needs agents, workflows, memory, tools, MCP, evals, and traces in one backend layer. It is not the right answer for every chat feature.

A practical field note on where Mastra, CopilotKit, and LangGraph fit when you are building the same agent-native product interface.

The AI coding market is noisy. The changes that matter are easier to spot when you separate model capability, editor loops, terminal agents, background agents, agent frameworks, UI layers, context, security, and cost.

If I were rebuilding my AI coding workflow on May 30, 2026, I would not pick one magic tool. I would pick a layered stack: terminal agent, editor, background agent, Mastra, CopilotKit, MCP, context, security, and cost controls.

AI coding agents become safer when permissions, logs, and rollback are designed as one system. Here is the operating loop I would put around any agent that can edit code, run tools, or open pull requests.

Prompt injection stops being an abstract LLM risk once an agent can call tools. The practical defense is data boundaries, structured handoffs, tool guardrails, and approval gates around side effects.

May 2026 was not about one more coding model leaderboard. The useful signal was control planes, UI-agent contracts, durable TypeScript workflows, usage economics, and runtime security.

CopilotKit is strongest when you treat it as the product-facing agent UI layer: chat surfaces, frontend tools, shared state, generative UI, and human approval around a backend agent.

Claude Opus 4.8 looks like a benchmark bump, but the developer story is better honesty, dynamic workflows, and effort controls that make long-running agent work easier to review.

CodeGraph is trending because AI coding teams are running into the same bottleneck: agents waste too many tokens rediscovering the repo. Local indexes help, but only if you treat them as navigation aids instead of source truth.

AI coding agents have crossed from demo to daily workflow. The next bottleneck is not demand. It is cost attribution, budget gates, and workflow design that keeps agent fleets from turning useful work into surprise spend.

A front-page Hacker News essay about being tired of AI answers points at a real developer problem: chat is too easy to launder into fake work. The fix is verifiable workflows, not more conversational polish.

GitHub is suddenly full of codebase knowledge graph projects for Claude Code, Codex, Cursor, and other agents. The useful version is not a pretty graph. It is a map that changes planning, editing, and review.

Anthropic's knowledge-work plugin repo is trending because it packages skills, connectors, slash commands, and sub-agents around job functions. The interesting shift is from personal prompts to team-distributed operating systems.

A new arXiv paper shows coding agents can pass loose backend tasks, then fall apart when architecture, database, and ORM constraints pile up. The fix is not longer markdown. It is executable constraints.

Reasonix hit Hacker News with a DeepSeek-native pitch: keep long coding sessions cheap by designing the agent loop around prefix caching. The interesting question is when cache efficiency helps quality, and when it fights the harness.
HKUDS/CLI-Anything hit 40,000 stars by solving a stubborn gap: most desktop software has no interface AI agents can reliably drive. Its 7-phase pipeline auto-generates a tested CLI harness from source code.

Anthropic's Project Glasswing update is a useful signal for developer teams: AI can find vulnerability candidates faster than humans can verify, disclose, patch, and ship them.
A practical framework for building LLM-powered software that actually ships to production customers - not just demos. 21.8k stars and still climbing.

The Multi-Stream LLMs paper argues that agents are bottlenecked by single chat streams. The practical takeaway is not to rebuild everything today, but to design agent runtimes around separated channels.

Runtime's Launch HN thread is a useful signal: teams do not just want isolated coding agents. They want a control plane for approvals, secrets, telemetry, review, and merge policy.

CodeGraph is trending because it points at a real bottleneck in AI coding: bigger context windows do not replace a fast, local, queryable map of the repository.

Forge hit the Hacker News front page with a strong claim: small local models can become much more useful at tool-calling when the harness catches structural failures, retries intelligently, and controls context.

Anthropic's Stainless acquisition is not just an SDK deal. It is a bet that agents need generated SDKs, CLIs, docs, and MCP servers from the same source of truth.
The humanlayer/12-factor-agents repo distills hard-won lessons from shipping AI agents into 12 concrete principles. It crossed 21,000 stars on GitHub this week.
humanlayer/12-factor-agents crossed 20k stars with a simple argument: most AI agents fail in production because they ignore decades of software engineering wisdom. Here are the twelve principles fixing that.
obra/superpowers is a composable skills framework for coding agents that turns vague requests into structured, test-driven development - and it runs across Claude Code, Codex, Cursor, and Gemini CLI.
agentmemory is a self-hosted MCP server that gives Claude Code, Cursor, and Gemini CLI searchable long-term memory across sessions - with 12 auto-capture hooks and 51 tools, no external database required.
obra/superpowers enforces a seven-step development loop - brainstorming through branch completion - across Claude Code, Cursor, Gemini CLI, and five other agent platforms.

Persistent memory for coding agents is trending because every session still starts too cold. The hard part is not saving facts. It is proving recall, freshness, deletion, and rollback under real development pressure.

Claude Platform on AWS matters because it moves agent adoption into identity, billing, commitments, and platform controls. That is where enterprise AI work gets real.

Thinking Machines' interaction-models post points at a useful shift for developer tools: stop designing around single chat turns and start designing around shared work.

The TanStack npm incident was not just a package-security story. It was a reminder that AI agent workflows inherit every weak trust boundary in CI.
agentmemory gives AI coding agents a persistent brain - capturing session context automatically via 12 Claude Code hooks and 51 MCP tools, with 95.2% retrieval accuracy and 92% token savings over context-pasting.

Claude Managed Agents now have multiagent sessions, outcomes, webhooks, and vault events. The practical takeaway is not just better agents. It is that agent runs need backend job discipline.
AgentMemory hit GitHub's daily trending list with 400 new stars today, offering a persistent memory layer for AI coding agents that benchmarks at 95.2% retrieval accuracy on LongMemEval-S and 92% token reduction.
DeepSeek-TUI is a Rust-built terminal coding agent wrapping the DeepSeek V4 API with full tool use, MCP server support, a composable skills system, and three operational modes for different risk tolerances.

31 deployed apps. 7 down. Favicons missing on 20 of 24 reachable hosts. Sentry on zero. Here is how a single audit turned into 58 PRs in one afternoon - and what shipped, what didn't, and what the pattern was.

Notes from a single session running 200+ Claude Code subagents in parallel across 35 repos. What worked, what broke, and the patterns I codified into a skill so the recipe replays.

Codex automations are useful when recurring engineering work has clear inputs, reviewable outputs, and safe boundaries. Here is the practical playbook.

OpenAI is turning Codex from a coding assistant into a broader agent workspace for files, apps, browser QA, images, automations, and repeatable knowledge work.

Boris Cherny's loop-heavy Claude Code workflow points at the next Codex content lane: recurring agents that babysit PRs, CI, deploys, and feedback streams.

Andrej Karpathy's loopy era frame explains why Codex is becoming less like a chatbot and more like an agent loop manager for real software work.
Ruflo crossed 37,700 GitHub stars this week, adding nearly 1,900 in a single day. It turns Claude Code into a coordinated swarm of 100+ specialized agents with MCP integration, distributed vector memory, and zero-trust agent federation.

Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and execution environments, then return compact summaries and receipts.

Manual approval prompts stop protecting users when coding agents ask too often. The better pattern is risk-aware autonomy: safe defaults, narrow deny rules, and approvals only for meaningful changes.

Claude Code is turning into an orchestration layer for agent teams. Here is how subagents, MCP, hooks, and long context fit together in 2026.

A Show HN PDF form demo points at a bigger architecture shift: keep sensitive documents local, expose narrow browser tools to the model, and make AI assistance inspectable.

A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.

A long-form technical read on Flue from Fred K Schott, with deeper comparisons against OpenAI Agents, Vercel AI SDK, Google ADK, LangChain, Deep Agents, and CrewAI, plus practical production patterns.
Ruflo hit the top of GitHub trending this week with 35k stars. It turns Claude Code into a multi-agent platform with swarm coordination, vector memory, and cross-machine federation.

A long-running coding agent is only useful if the environment around it can queue tasks, capture logs, checkpoint state, verify behavior, limit cost, and recover from failure.

Most agent tool APIs are just REST endpoints with nicer names. Production agents need intent-shaped tools that compress workflows, reduce context, and return reviewable receipts.

Skills turn a general coding agent into a trained teammate by packaging runbooks, scripts, examples, and domain-specific judgment into reusable instructions.

Warp going open source is not just a terminal story. It is a signal that AI coding tools are shifting from chat UX toward agent operations, where planning, execution, review, and feedback loops live close to the shell.

I told an agent to improve the site every 10 minutes and went to sleep. Here is what 12 new repos, 60 PRs, and three goofs taught me about overnight orchestration.

A practical architecture for multi-step Claude agents. Loop patterns, state management, error recovery, and the production gotchas that turn a five-step demo into a 20 percent success rate at scale.

Build MCP servers that connect Claude to your databases, APIs, and tools. Architecture, TypeScript SDK code, debugging, and the production gaps the spec doesn't cover.

Master tool use in the Claude API. Schema design, retry logic, multi-step loops, and the failure modes that only show up at 10k calls a day.

Five worked examples showing how the new Developers Digest products plug into each other. Real agent filesystems, auto-snapshots, gated skill libraries, eval suites, and a recursive MCP host.

agentfs is filesystem-shaped storage for AI agents. Postgres-backed on Neon, no cold starts, no exec by design. Pay-only plans start at twenty dollars.

Ten private tools shipped overnight - observability, skills, hooks, prompts, and evals - aimed at the agent infrastructure gap small teams keep falling into.

The math of agent pipelines is brutal. 85% reliability per step compounds to about 20% at 10 steps. Here is why long chains collapse in production, and the six patterns the field has converged on to fight the decay.

From single-agent baselines to multi-level hierarchies, these are the seven patterns for wiring AI agents together in production. Each with a decision rule, an implementation sketch, and the tradeoffs that actually matter.
Multica hit 17k GitHub stars this week with a bold idea - treat coding agents like teammates, not tools. Here is what it actually does and whether it delivers.

Five managed-agent providers, five pricing models, zero unified cost attribution. If you're running agents overnight, you need FinOps you don't have yet.

Four agents, same tasks. Honest trade-offs from a developer shipping production apps with all of them.

CLAUDE.md is the highest-leverage file in any Claude Code project. Here's what goes in one, what doesn't, and the patterns that actually ship.

Autocomplete wrote the line. Agents write the pull request. The shift from Copilot to Claude Code, Cursor Agent, and Devin - explained with links to the docs that prove every claim.

MCP is the USB-C of AI agents. What the Model Context Protocol is, why Anthropic built it, and how to install your first server in Claude Code or Cursor. Fact-checked against the official MCP spec.

Claude Code is Anthropic's AI coding agent for your terminal. What it does, how it works, how it compares to Cursor and Codex, and how to ship your first feature with it. Fact-checked against official docs.

A practical security playbook for running Codex cloud tasks safely in 2026 using OpenAI docs: internet access controls, domain allowlists, HTTP method limits, and review workflows.

Hacker News keeps arguing about Claude Code, Codex, skills, MCP, and orchestration. Under the noise, the same four truths keep surfacing: workflows matter more than demos, verification is the bottleneck, skills beat prompts, and orchestration matters more than raw autonomy.

How to use AI agents to plan, scaffold, build, test, and deploy a SaaS product. Parallel development patterns, real workflow examples, and the operational details that determine whether your AI-assisted build succeeds or fails.

Context engineering is the practice of designing the persistent information that surrounds every AI interaction. CLAUDE.md files, system prompts, skill libraries, and memory systems. It is the single highest-leverage skill for developers working with AI agents in 2026.

Production-tested patterns for orchestrating AI agent teams - from fan-out parallelism to hierarchical delegation. Covers CrewAI, LangGraph, AutoGen, OpenAI Agents SDK, Google ADK, and custom approaches with real code.

AI agents that reflect on failures, accumulate skills, and get better with every session. Reflection patterns, memory architectures, skill extraction, and working code examples for building agents that actually learn.

Agents forget everything between sessions. Here are the patterns that fix that: CLAUDE.md persistence, RAG retrieval, context compression, and conversation summarization.

AI agents fail in ways traditional debugging cannot catch. Here are the tools and patterns for finding and fixing broken agent loops, tool failures, and context issues.

AI agent skills are not just for developers. Here is how 12 professions use packaged AI workflows to do better knowledge work.
A step-by-step guide to building AI agents that actually work. Choose a framework, define tools, wire up the loop, and ship something real.
How to spec agent tasks that run overnight and wake up to verified, reviewable code. The spec format, pipeline, and review workflow.

AI agents use LLMs to complete multi-step tasks autonomously. Here is how they work and how to build them in TypeScript.

A practical guide to building AI agents with TypeScript using the Vercel AI SDK. Tool use, multi-step reasoning, and real patterns you can ship today.

From swarms to pipelines - here are the patterns for coordinating multiple AI agents in TypeScript applications.

AI coding agents are submitting pull requests to open source repos - and some CONTRIBUTING.md files now contain prompt injections targeting them.

MCP lets AI agents connect to databases, APIs, and tools. Here is what it is and how to use it in your TypeScript projects.

OpenClaw has 247K stars and zero MCPs. The best tools for AI agents aren't new protocols - they're the CLIs developers have used for decades.

Composio is a tool infrastructure layer that connects AI agents to Gmail, GitHub, Slack, Google Calendar, and hundreds more apps - all auth handled for you. Here is how to set it up and start building real cross-app workflows.

OpenAI released their Agents SDK for TypeScript with first-class support for tool calling, structured outputs, multi-agent coordination, streaming, and human-in-the-loop approvals. Here is how each piece works.

OpenAI's Deep Research is an AI agent inside ChatGPT that plans and executes multi-step research workflows, browsing dozens of websites and producing cited reports in minutes instead of hours.

OpenAI added scheduled tasks and reminders to ChatGPT, turning it from a chat interface into something closer to a personal AI agent. Here is how it works, what it can do today, and where this is heading.

Google's Gemini Advanced includes a deep research feature that searches dozens of websites, verifies information across multiple sources, and generates detailed cited reports. Here is how it works and how it compares to other AI research tools.

Wire a Python LangGraph agent into a Next.js frontend using CopilotKit's co-agent architecture. Full walkthrough covering the graph, search nodes, streaming state, and the React UI.
Lightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the experimental Swarm project.
AI FrameworksWorkflow automation platform with native AI agent building. Visual editor plus JavaScript/Python code nodes, 500+ integrations, self-hostable under a fair-code license.
ProductivityConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
GuideWhat MCP servers are, how they work, and how to build your own in 5 minutes.
GuideStep-by-step guide to building an MCP server in TypeScript - from project setup to tool definitions, resource handling, testing, and deployment.
GuideDeep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
Guide
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 505 topics
Browse All Topics