20 items
20 posts
Filippo Valsorda argues that LLMs have ended the era of treating security researchers with kid gloves. When anyone can discover vulnerabilities with an AI, the old coordinated disclosure model breaks down.
Unsloth's dynamic quantization makes GLM-5.2 runnable on a 256GB Mac or a 24GB GPU with CPU offloading. Here is the hardware math, the quantization tradeoffs, and what the HN community learned from actually running it.
A new paper shows a 3B parameter model hitting 94.3 on AIME26 and 96.1% on LeetCode contests - matching or exceeding models 100x its size. The catch: it traded general knowledge for pure reasoning ability.
Switzerland's fully open foundation model promises transparent training data and EU compliance. The HN crowd has questions about actual performance.
New research from MIT reveals that LLMs identify speakers by writing style, not by tags - meaning attackers who sound like the system effectively become the system. The findings explain why prompt injection remains unsolved.
New benchmark data shows GPT-5.5 hallucinates 86% of the time when it does not know the answer - versus 28% for the open-weights GLM-5.2. The numbers challenge the assumption that bigger models equal more reliable output.
Modern LLMs now use MoE routing, mixed attention variants, and fused vision encoders. The simple transformer stack is gone - here's what replaced it and why it matters for developers.
Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.
Fable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the honest math on when RAG still wins.
Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.
A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.
Task budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.
A verified directory of the frontier AI models in June 2026 - Claude Fable 5, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, and DeepSeek V4 - with pricing checked against official docs.
Claude Fable 5 latency measured: 109 seconds to first token at max effort vs 1.4s for Sonnet 4.6. When slow is fine, when it hurts, and how to route around it.
Migrating off retired GPT models in 2026: the live retirement table, what maps to what, an eval-before-switch day plan, and when to jump providers.
Rewriting prompts and skills for Fable 5: what changes when you migrate agents from Opus 4.x, how effort interplay works, and which old workarounds now hurt.
Anthropic's Claude Fable 5 includes undisclosed interventions that silently degrade responses for certain ML development tasks - no fallback notice, no refusal, just worse answers.
Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.
The Multi-Stream LLMs paper argues that agents are bottlenecked by single chat streams. The practical takeaway is not to rebuild everything today, but to design agent runtimes around separated channels.
A trending refusal-direction paper is a reminder that model safety cannot be treated as a thin refusal layer. Builders need layered controls around the model.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.