65 terms and counting. Written for developers building with AI, not researchers reading papers.
A development workflow where AI agents write, test, and iterate on code autonomously. Instead of suggesting completions, the agent plans multi-step tasks, runs commands, reads errors, and fixes them in a loop until the job is done.
The core cycle that powers AI agents: observe the current state, think about what to do next, act by calling a tool or generating output, then repeat. Each iteration feeds the result of the last action back into the model's context so it can decide the next step. This observe-think-act pattern is what separates agents from single-shot prompt-response interactions.
Software that uses a large language model to reason about goals, break them into steps, call external tools, and act on results without human intervention at each step. Agents differ from chatbots because they maintain a plan and execute it across multiple turns.
A secret string that authenticates your application with an external service. AI providers like OpenAI, Anthropic, and Google issue API keys so your code can send prompts and receive completions programmatically.
The core technique inside transformers that lets a model weigh the relevance of every token relative to every other token in a sequence. Instead of processing input left-to-right, attention computes relevance scores across all positions at once, allowing the model to focus on the most important parts of the context regardless of distance.
Running a coding agent in a mode where it completes entire features without pausing for human approval. The agent reads the codebase, writes code, runs tests, and commits - all from a single prompt.
Standardized tests that measure model performance on tasks like code generation, math, reasoning, and instruction following. Common benchmarks include SWE-bench, HumanEval, MMLU, and GPQA. They help developers compare models, but real-world performance often differs from benchmark scores.
Software that compiles, bundles, or transforms source code into production-ready output. In the AI dev space, build tools like Turbopack, Vite, and esbuild handle TypeScript compilation, module bundling, and hot reload for the frameworks that power AI applications.
A prompting technique where the model is asked to show its step-by-step reasoning before arriving at a final answer. CoT improves accuracy on math, logic, and coding tasks by forcing the model to decompose problems rather than jumping to conclusions. Reasoning models like o1 and o3 use chain-of-thought internally as part of their training.
A markdown file placed in your project root that configures Claude Code's behavior. It defines project rules, coding conventions, file structure, and custom instructions that persist across sessions - acting as a project-specific system prompt for your AI coding agent.
Anthropic's official CLI tool for agentic coding. It runs in your terminal, reads your codebase, edits files, runs commands, and iterates on tasks autonomously. It uses CLAUDE.md for project config and supports sub-agents, hooks, and MCP integrations.
A text-based interface for interacting with software by typing commands. Many AI coding tools - Claude Code, Gemini CLI, Codex - run as CLIs because they integrate directly into the developer's terminal workflow alongside git, npm, and other standard tools.
The discipline of designing what information goes into a model's context window and how it is structured. Context engineering goes beyond prompt engineering by managing system prompts, retrieved documents, tool results, conversation history, and memory to give the model exactly the right information at the right time. It is the difference between a model that kind of works and one that works reliably.
The maximum amount of text (measured in tokens) that a model can process in a single request. Larger context windows let agents read more code at once. Modern models range from 128K to over 1M tokens, but effective use of context still matters more than raw size.
An AI agent that can perform extended, multi-step research or coding tasks by spawning sub-tasks, searching the web, reading documents, and synthesizing findings. Deep agents run for minutes or hours rather than seconds, tackling problems too complex for a single prompt-response cycle.
A class of generative models that learn to create data by reversing a gradual noising process. During training, the model learns to remove noise from corrupted data step by step. During generation, it starts from pure random noise and iteratively denoises it into coherent output. Diffusion models power image generators like Stable Diffusion, DALL-E, and Midjourney.
A training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model. The student is trained on the teacher's outputs rather than raw data, inheriting much of the larger model's capability at a fraction of the size and inference cost. Distillation is how many fast, lightweight models are created from frontier models.
Numerical vector representations of text that capture semantic meaning. Similar concepts have vectors that are close together in high-dimensional space. Embeddings power semantic search, RAG systems, and recommendation engines by letting you find related content without exact keyword matches.
Serverless functions that run on CDN nodes close to the user rather than in a central data center. Platforms like Vercel and Cloudflare Workers use edge functions to reduce latency for API routes, middleware, and AI inference endpoints.
The systematic process of testing an AI model's performance against a defined set of inputs and expected outputs. Evals measure whether a model is actually good at the task you care about, not just benchmarks. They can be automated (comparing outputs to ground truth) or human-judged (rating quality on a rubric). Running evals before and after changes is how teams catch regressions and validate improvements.
The process of training a pre-existing model on a custom dataset to specialize its behavior. Fine-tuning adjusts model weights for specific tasks - like classifying support tickets or generating code in a particular framework - without training from scratch.
A model capability where the LLM outputs structured JSON describing which function to call and with what arguments, rather than plain text. This lets AI applications reliably trigger actions like database queries, API calls, or tool use based on natural language input.
AI systems that create new content - text, images, code, audio, video - rather than just classifying or analyzing existing data. LLMs like Claude and GPT are generative models trained to predict and produce sequences of tokens based on input prompts.
The practice of structuring web content so AI models cite and surface it in their responses. While SEO targets search engine rankings, GEO targets AI-generated answers by using clear definitions, structured data (JSON-LD), and authoritative formatting that models can extract.
Connecting a model's responses to verified, external data sources rather than relying solely on its training data. Grounding techniques include RAG, tool use, and web search - they reduce hallucinations by giving the model facts to reference instead of generating from memory alone.
Safety constraints and validation layers applied to AI model inputs and outputs. Guardrails can block harmful content, enforce output formats, prevent prompt injection, filter sensitive data, and keep responses on-topic. They are typically implemented as middleware that wraps model calls rather than modifications to the model itself.
When a model generates confident-sounding information that is factually incorrect or fabricated. Hallucinations happen because LLMs predict plausible-sounding text, not verified facts. Techniques like RAG, grounding, and structured output help reduce but do not eliminate hallucinations.
User-defined shell commands that run automatically at specific points in the Claude Code lifecycle - before a tool executes, after a tool completes, or when a notification fires. Hooks let you enforce project rules, run linters, or trigger custom workflows without modifying the agent itself.
A software application that combines a code editor, debugger, terminal, and tooling into one workspace. VS Code, Cursor, and Zed are popular IDEs. AI coding assistants increasingly integrate into IDEs or replace them entirely with CLI-based workflows.
The process of running input through a trained model to get a prediction or output. When you send a prompt to an API and get a response, that is inference. Inference cost, speed, and latency are key factors when choosing between AI providers and models.
A standardized format for embedding structured data in web pages using JSON syntax within a script tag. Search engines and AI models read JSON-LD to understand page content - common schemas include Article, FAQPage, DefinedTermSet, and BreadcrumbList.
A structured repository of information that an AI system can query to answer questions or provide context. In AI applications, knowledge bases are often backed by vector databases and used in RAG pipelines, letting models access up-to-date, domain-specific facts that were not part of their training data.
The compressed, high-dimensional representation that a neural network learns internally. Each point in latent space encodes a meaningful combination of features from the training data. Navigating latent space is how generative models interpolate between concepts, and it is why models can produce novel outputs that blend characteristics of things they have seen before.
A neural network trained on massive text datasets that can generate, summarize, translate, and reason about language. Models like Claude, GPT, Gemini, and Llama are LLMs. They power chatbots, coding agents, search tools, and most modern AI applications.
A parameter-efficient fine-tuning method that trains a small set of adapter weights instead of modifying the full model. LoRA makes fine-tuning practical on consumer hardware and is widely used in the open-source model community for creating specialized model variants.
An open protocol created by Anthropic that standardizes how AI applications connect to external data sources and tools. MCP servers expose resources, tools, and prompts through a common interface so any MCP-compatible client can use them without custom integration code.
A model architecture that routes each input to a small subset of specialized sub-networks ("experts") rather than activating the entire model. A gating network decides which experts handle each token, so the model can have a massive total parameter count while only using a fraction of them per inference pass. MoE powers models like Mixtral and GPT-4, delivering strong performance at lower compute cost than dense models of equivalent size.
Architectures where multiple AI agents collaborate on a task, each handling a specialized role. One agent might research while another writes code and a third reviews it. Multi-agent patterns include orchestrator-worker, pipeline, and swarm topologies.
AI models that can process and generate more than one type of data - text, images, audio, video, or code. A multi-modal model can analyze a screenshot, read the text in it, and generate code that reproduces the UI, all in a single interaction.
A computing architecture loosely inspired by biological neurons, made up of layers of interconnected nodes that transform input data through learned weights and activation functions. Neural networks are the foundation of modern AI. Stacking many layers creates "deep" neural networks, which is where the term deep learning comes from.
The field of AI focused on enabling computers to understand, interpret, and generate human language. NLP covers everything from tokenization and sentiment analysis to machine translation and conversational AI. LLMs represent the current state of the art in NLP, but the field also includes older techniques like TF-IDF, named entity recognition, and dependency parsing.
AI models released with publicly available weights that anyone can download, run, fine-tune, and deploy. Models like Llama, Qwen, Mistral, and DeepSeek offer alternatives to closed APIs, enabling local inference, customization, and full control over your AI stack.
The coordination layer that manages how AI agents, tools, and data sources work together in a pipeline. Orchestration handles routing prompts to the right model, managing context across steps, retrying failures, and combining results from parallel sub-tasks.
A metric that measures how well a language model predicts a sequence of tokens. Lower perplexity means the model is less "surprised" by the text and assigns higher probability to the correct next tokens. Perplexity is commonly used to compare language models during training and evaluation, though it does not always correlate perfectly with real-world task performance.
The practice of designing and iterating on prompts to get consistent, high-quality outputs from AI models. Good prompt engineering involves clear instructions, examples (few-shot), structured output formats, and systematic testing - not guesswork.
A company or service that hosts AI models and exposes them through an API. Major providers include Anthropic (Claude), OpenAI (GPT), Google (Gemini), and Meta (Llama via third parties). Frameworks like the Vercel AI SDK abstract over providers so you can switch models without rewriting code.
The process of reducing the numerical precision of a model's weights, typically from 16-bit or 32-bit floating point down to 8-bit, 4-bit, or even lower. Quantization dramatically reduces model size and memory usage, making it possible to run large models on consumer GPUs and edge devices. The trade-off is a small loss in output quality, though modern quantization techniques like GPTQ and AWQ minimize this gap.
A pattern that improves LLM responses by retrieving relevant documents from an external knowledge base and injecting them into the prompt before generation. RAG gives the model up-to-date, domain-specific context without fine-tuning, reducing hallucinations and keeping responses grounded in real data.
Models specifically trained or prompted to show their step-by-step thinking process before producing a final answer. Models like o1, o3, and Claude with extended thinking use chain-of-thought reasoning to tackle complex math, logic, and coding problems more reliably.
A training technique that fine-tunes a model using human preference judgments. Humans rank model outputs from best to worst, and those rankings train a reward model. The language model is then optimized via reinforcement learning to produce outputs the reward model scores highly. RLHF is a key step in making raw pre-trained models helpful, harmless, and aligned with human intent.
Delivering model output token-by-token as it is generated rather than waiting for the full response. Streaming improves perceived latency in chat interfaces and coding agents. The Vercel AI SDK and most provider SDKs support streaming responses out of the box.
Constraining a model to respond in a specific format - typically JSON matching a defined schema. Structured outputs eliminate parsing failures and make AI responses reliable enough to pipe directly into application logic. Zod schemas are commonly used to define the expected shape.
Lightweight AI agents spawned by a parent agent to handle a specific sub-task in parallel. Claude Code uses sub-agents (via the Task tool) to divide work - one sub-agent researches while another writes code - then the parent synthesizes the results.
A benchmark that evaluates AI coding agents on real-world software engineering tasks pulled from GitHub issues. Each task requires the agent to read a codebase, understand the bug or feature request, and produce a working patch. SWE-bench has become the standard measure for how well AI agents can do actual software development, not just isolated code generation.
Hidden instructions prepended to every conversation that define an AI model's behavior, personality, and constraints. System prompts set the rules that the model follows - like response format, tone, and what topics to avoid. CLAUDE.md files serve a similar purpose for coding agents.
A parameter (typically 0 to 2) that controls how random or creative a model's output is. Low temperature (0-0.3) produces focused, deterministic responses ideal for code generation. High temperature (0.7-1.5) produces more varied, creative outputs better for brainstorming.
The basic unit of text that LLMs process. A token is roughly 3-4 characters or about 0.75 words in English. Models have token limits for input (context window) and output (max completion). API pricing is typically measured per million tokens.
A model capability where the LLM can invoke external tools - running code, searching the web, reading files, calling APIs - as part of generating a response. Tool use turns a passive text generator into an active agent that can interact with the real world.
The neural network architecture behind virtually all modern large language models. Introduced in the 2017 paper "Attention Is All You Need," transformers use self-attention mechanisms to process all tokens in a sequence simultaneously rather than one at a time. This parallelism makes them vastly more efficient to train than previous architectures like RNNs and LSTMs, and is the reason LLMs were able to scale to billions of parameters.
A category of machine learning where models learn patterns from data without labeled examples or explicit correct answers. The model discovers structure on its own, such as clusters, correlations, or compressed representations. LLM pre-training is a form of unsupervised (or self-supervised) learning, where the model learns to predict the next token from massive amounts of unlabeled text.
A database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases like Pinecone, Weaviate, and pgvector enable fast similarity search, powering RAG pipelines, semantic search, and recommendation systems at scale.
A development approach where you describe what you want in natural language and let an AI agent handle the implementation details. The developer focuses on the overall direction and feel of the project rather than writing every line. The term was coined by Andrej Karpathy.
The numerical parameters inside a neural network that are learned during training. Weights determine how the network transforms input into output. When people say a model has 70 billion parameters, they mean 70 billion weights. Releasing model weights publicly is what makes open-source AI models possible, since anyone with the weights can run inference without depending on an API provider.
A git feature that Claude Code uses to run multiple agents on separate branches simultaneously without conflicts. Each worktree is a separate working directory linked to the same repository, letting agents work on different features in parallel and merge results back.
A TypeScript-first schema declaration and validation library. In AI development, Zod defines the shape of structured outputs from LLMs, validates API payloads, and ensures type safety at runtime. It is the standard schema tool in the Vercel AI SDK and most TypeScript AI stacks.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.