51 items
51 posts
Goal, loop, routine. Three verbs, two tools, one hard part. A complete field guide to running agentic loops in Claude Code and Codex, the real commands, the patterns people actually run, and the two failure modes that burn money.
MCP's new enterprise-managed authorization flow is not just less login friction. It moves agent tool access into identity, policy, and audit systems enterprises already understand.
Cohere shipped its first developer-facing model on June 9, 2026. North Mini Code is a 30B mixture-of-experts coding model with 3B active parameters, Apache 2.0 weights, and a deployment footprint of a single H100. Here is what it actually offers and where the open questions are.
The viral DN42 AWS bill story is funny until you realize the missing primitive: infrastructure agents need hard cloud-spend guardrails before they touch real accounts.
Choosing a local coding LLM in 2026 means balancing benchmark performance, hardware cost, and the compliance pressure to keep code off third-party servers. Here is what to run and on what hardware.
A Hacker News thread on config files that run code points at the next AI coding risk: agent hooks, skills, and editor rules need review like executable dependencies.
OpenAI's harness engineering post and new token-use research point to the same lesson: agentic coding teams need token budgets, receipts, and eval loops, not vibes.
The rsync Claude debate shows why teams need reproducible defect forensics before AI attribution becomes a public blame machine.
Anthropic's open-source vulnerability harness shows where AI security work is going: reproducible exploit loops, separate verification agents, and patch receipts.
Anthropic's Claude containment writeup points to the next security layer for coding agents: deterministic capability ledgers, not another approval prompt.
GitHub Trending is full of agent memory and context tools. The useful version is not magic recall. It is a context ledger: source-linked, scoped, expiring memory that agents can inspect and users can audit.
The ChatGPT for Google Sheets exfiltration report is not just a spreadsheet bug. It is a warning about agentic office tools: permissions need to be action-scoped, logged, revocable, and visible.
A huge Hacker News thread says domain expertise is the real moat in agentic coding. The sharper version: tacit judgment only compounds when you turn it into examples, tests, DSLs, and review gates.
Before an AI agent gets tools, files, APIs, MCP servers, or deployment access, decide what it can read, write, call, log, and roll back.
AI coding agents become safer when permissions, logs, and rollback are designed as one system. Here is the operating loop I would put around any agent that can edit code, run tools, or open pull requests.
Prompt injection stops being an abstract LLM risk once an agent can call tools. The practical defense is data boundaries, structured handoffs, tool guardrails, and approval gates around side effects.
CodeGraph is trending because AI coding teams are running into the same bottleneck: agents waste too many tokens rediscovering the repo. Local indexes help, but only if you treat them as navigation aids instead of source truth.
AI coding agents have crossed from demo to daily workflow. The next bottleneck is not demand. It is cost attribution, budget gates, and workflow design that keeps agent fleets from turning useful work into surprise spend.
A front-page Hacker News essay about being tired of AI answers points at a real developer problem: chat is too easy to launder into fake work. The fix is verifiable workflows, not more conversational polish.
Anthropic's knowledge-work plugin repo is trending because it packages skills, connectors, slash commands, and sub-agents around job functions. The interesting shift is from personal prompts to team-distributed operating systems.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.