1 article

NVIDIA's Nemotron 3 Super combines latent mixture of experts with hybrid Mamba architecture - 120B total parameters, 12B active per token, 1M context, and up to 4x more experts at the same cost.
Looking for tools and guides about Mamba too?
View Mamba Topic Hub
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Explore 160 topics
Browse All Topics