TL;DR
OpenAI has released its first open-weight models in over five years. GPT-OSS 12B and GPT-OSS 20B are now available under the Apache 2.0 license, marking a significant shift in strategy for the comp...
OpenAI has released its first open-weight models in over five years. GPT-OSS 12B and GPT-OSS 20B are now available under the Apache 2.0 license, marking a significant shift in strategy for the company. These are reasoning models built on a Mixture of Experts (MoE) architecture, designed to run efficiently on consumer hardware while delivering competitive performance against frontier closed models.

Two variants are available:
GPT-OSS 20B - The efficient option. Activates 3.6 billion parameters per token and runs on a laptop with 16GB of RAM. Suitable for offline, private deployments where data cannot leave the local environment.
GPT-OSS 120B - The larger variant. Activates 5.1 billion parameters per token despite its name, deployable on a single 80GB GPU such as an NVIDIA A100. This model targets production applications requiring higher capability.
Both models support a 128,000 token context window and were trained primarily on English text with emphasis on STEM, coding, and general knowledge. OpenAI is also releasing the O200K tokenizer used for GPT-4 and GPT-4o mini, now open-sourced as part of this announcement.
The standout feature is the integration of tool use within the reasoning process. During the post-training phase, OpenAI trained these models to invoke tools like web search and code execution before finalizing responses. This happens inside the chain-of-thought trace.
This architecture eliminates the need for external agent orchestration. The model can search, evaluate results, and decide to search again if the first query fails, all within its internal reasoning loop. For developers building agentic applications, this reduces complexity significantly. No separate agent framework is required to handle tool selection, reflection, and iterative refinement.

Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The 120B model outperforms o3-mini across standard benchmarks, even without tool access. Against the full o3 model, it remains competitive.
| Benchmark | GPT-OSS 120B | GPT-OSS 20B |
|---|---|---|
| MMLU | 90.0% | 85.3% |
| GPQA Diamond | 80.1% | 71.5% |
| Humanity's Last Exam | Strong | Strong for size |
| Competition Math | Near o3/o4-mini | Competitive |
On artificial analysis aggregations, these models sit respectably against Gemini 2.5, Grok 2, and other frontier systems. The critical caveat: these are not code-generation specialists. They will not build full web applications from prompts like Claude Opus or similar top-tier coding models. They excel at reasoning, analysis, and tool-augmented tasks rather than end-to-end application generation.

Because these are Apache 2.0 licensed, hosting competition is already aggressive:
GPT-OSS 120B:
GPT-OSS 20B:
Groq delivers over 1,000 tokens per second on the 20B model and approximately 500 tokens per second on the 120B variant. OpenRouter provides unified billing across providers with transparent latency and throughput metrics if you prefer a single integration point.

For local execution, HuggingFace hosts the model weights. Ollama provides the simplest setup path:
ollama run gpt-oss # Defaults to 20B model
For the 120B model, you need hardware like an A100 or an M3 Max with substantial RAM.
Cloud deployment options include Groq for low-latency inference, Fireworks for cost optimization, and OpenRouter for multi-provider access. Each platform exposes the standard OpenAI-compatible API, making migration straightforward.
GPT-OSS fills a specific niche: capable reasoning with tool integration at low cost and manageable hardware requirements. These models are not replacements for top-tier closed models on creative or complex coding tasks. They are practical choices for applications requiring reasoning, moderate coding assistance, and agentic tool use without the infrastructure overhead of massive parameter counts or closed API dependencies.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View Tool
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
OpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT P...
Install Ollama, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedInstall the dd CLI and scaffold your first AI-powered app in under a minute.
Getting StartedInstall Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting Started
OpenAI's New GPT Image Model API📸 Today OpenAI released their new GPT Image one model via API! 🌟 Last month, ChatGPT introduced Image Generation, and it quickly became a hit with over...

Exploring Codex: AI Coding in Terminal In this video, I explore Codex, a new lightweight CLI tool for AI coding that runs in the terminal. This tool, possibly a response to Anthropic's CLI,...

Introducing OpenAI's Operator: The Future of Automated Task Management? In this video, I dive into the cutting-edge release of OpenAI's first AI agent research preview, Operator. Operator...

Two platforms, two philosophies. Here is how Anthropic and OpenAI compare on APIs, SDKs, documentation, pricing, and the...

State-of-the-art computer use, steerable thinking you can redirect mid-response, and a million tokens of context. GPT 5....

OpenAI is turning ChatGPT into a hub. The new Apps feature lets you access external services directly inside conversatio...