NVIDIA's Nemotron 3 Super in 6 Minutes

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B active per token, 1M context, and up to 4x more experts at the same cost.
4 articles

A practical walkthrough of Nemotron 3 Super: latent mixture of experts, hybrid Mamba transformer architecture, 1M context, reasoning modes, and the code you actually need to run it on NVIDIA hardware.

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B active per token, 1M context, and up to 4x more experts at the same cost.

NVIDIA's Nemotron Nano 2 VL delivers vision-language capabilities at a fraction of the computational cost. This 12-billion-parameter open-source model processes videos, analyzes documents, and reas...
Showing 3 of 3 articles

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 354 topics
Browse All Topics