LLMS

20 items

20 posts

BlogJun 24, 2026

Vulnerability Reports Are Not Special Anymore

Filippo Valsorda argues that LLMs have ended the era of treating security researchers with kid gloves. When anyone can discover vulnerabilities with an AI, the old coordinated disclosure model breaks down.

News Hacker News Security AI LLMs

BlogJun 23, 2026

GLM-5.2 Local Deployment: Running Z.ai's 744B Model on Consumer Hardware

Unsloth's dynamic quantization makes GLM-5.2 runnable on a 256GB Mac or a 24GB GPU with CPU offloading. Here is the hardware math, the quantization tradeoffs, and what the HN community learned from actually running it.

News Hacker News LLMs Open Weights Local AI Quantization

BlogJun 23, 2026

VibeThinker-3B: A 3 Billion Parameter Model That Outscores Opus 4.5 on Reasoning

A new paper shows a 3B parameter model hitting 94.3 on AIME26 and 96.1% on LeetCode contests - matching or exceeding models 100x its size. The catch: it traded general knowledge for pure reasoning ability.

News Hacker News LLMs Small Models AI Research Reasoning

BlogJun 22, 2026

Apertus: Europe's Answer to AI Sovereignty - and Why HN Is Skeptical

Switzerland's fully open foundation model promises transparent training data and EU compliance. The HN crowd has questions about actual performance.

News Hacker News AI Models Open Source LLMs

BlogJun 22, 2026

Prompt Injection is Role Confusion - New ICML Research Explains Why LLMs Can't Tell Friend from Foe

New research from MIT reveals that LLMs identify speakers by writing style, not by tags - meaning attackers who sound like the system effectively become the system. The findings explain why prompt injection remains unsolved.

News Hacker News AI Security LLMs Research

BlogJun 20, 2026

GPT-5.5 Has a 3x Higher Hallucination Rate Than MIT-Licensed GLM-5.2

New benchmark data shows GPT-5.5 hallucinates 86% of the time when it does not know the answer - versus 28% for the open-weights GLM-5.2. The numbers challenge the assumption that bigger models equal more reliable output.

News Hacker News LLMs GPT Benchmarks Open Weights

BlogJun 20, 2026

LLM Architectures Got Complicated Fast

Modern LLMs now use MoE routing, mixed attention variants, and fused vision encoders. The simple transformer stack is gone - here's what replaced it and why it matters for developers.

News Hacker News AI LLMs Machine Learning

BlogJun 11, 2026

The Claude Tokenizer Change: What ~30% More Tokens Means for Your Bill

Anthropic's docs say the tokenizer introduced with Opus 4.7 can use up to 35% more tokens for the same text. Here is what that does to per-request cost, max_tokens, and cross-model comparisons.

Anthropic AI Models LLMs Developer Tools

BlogJun 11, 2026

Fable 5 with 1M Context: What Actually Works in Practice

Fable 5 1M context workflows that actually work: whole-repo reviews, log archaeology, multi-doc synthesis - plus the honest math on when RAG still wins.

AI Models Anthropic Context Engineering LLMs

BlogJun 11, 2026

Fable 5 Effort Levels Explained: low to xhigh, and What They Cost You

Fable 5 effort levels explained: what low, medium, high, xhigh, and max actually change, which models support each level, and how effort drives your token bill.

Anthropic AI Models Claude Code LLMs

BlogJun 11, 2026

The Fable 5 Orchestrator Playbook: One Smart Model Managing Cheap Workers

A practical playbook for running Claude Fable 5 as the orchestrator over Sonnet and Haiku workers, with verified cost math on when the premium pays off.

AI Agents Anthropic AI Models LLMs

BlogJun 11, 2026

Fable 5 Task Budgets: Capping Agent Spend Before It Happens

Task budgets give Claude a token countdown for the whole agentic loop, so the model paces itself instead of discovering the limit when max_tokens truncates it. Here is how the beta works on Fable 5, what it does not enforce, and where it fits next to effort and the Usage API.

Anthropic AI Agents Developer Tools LLMs

BlogJun 11, 2026

The Frontier Model Landscape, June 2026 Edition

A verified directory of the frontier AI models in June 2026 - Claude Fable 5, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, and DeepSeek V4 - with pricing checked against official docs.

AI Models LLMs Pricing Developer Tools

BlogJun 11, 2026

Is Claude Fable 5 Slow? Latency in Practice, and When It Matters

Claude Fable 5 latency measured: 109 seconds to first token at max effort vs 1.4s for Sonnet 4.6. When slow is fine, when it hurts, and how to route around it.

AI Models Anthropic LLMs Performance

BlogJun 11, 2026

Migrating Off Retired GPT Models in 2026: A Working Checklist

Migrating off retired GPT models in 2026: the live retirement table, what maps to what, an eval-before-switch day plan, and when to jump providers.

OpenAI AI Models LLMs Developer Tools

BlogJun 11, 2026

Rewriting Your Prompts and Skills for Fable 5

Rewriting prompts and skills for Fable 5: what changes when you migrate agents from Opus 4.x, how effort interplay works, and which old workarounds now hurt.

Anthropic Prompt Engineering AI Agents LLMs

BlogJun 10, 2026

Fable 5's Hidden Guardrails: What Developers Need to Know About Silent Degradation

Anthropic's Claude Fable 5 includes undisclosed interventions that silently degrade responses for certain ML development tasks - no fallback notice, no refusal, just worse answers.

AI Safety LLMs Anthropic Developer Tools News Analysis

BlogJun 10, 2026

Fable 5 vs Opus 4.8: A Data-Driven Decision Guide for Engineering Teams

Fable 5 posts an 80.3% SWE-Bench Pro score and costs 2x Opus 4.8 - here is the task-profile scoring guide that tells you when the premium pays off.

AI Models Anthropic Code Review AI Agents Developer Tools LLMs

BlogMay 23, 2026

Multi-Stream LLMs Hint at the Next Agent Architecture

The Multi-Stream LLMs paper argues that agents are bottlenecked by single chat streams. The practical takeaway is not to rebuild everything today, but to design agent runtimes around separated channels.

AI Agents LLMs Research Developer Workflow Agent Architecture

BlogMay 2, 2026

Refusal Directions Are a Systems Problem

A trending refusal-direction paper is a reminder that model safety cannot be treated as a thin refusal layer. Builders need layered controls around the model.

AI Safety LLMs Agents Developer Tools Research

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

Browse All Tags