Mixture of Experts Powers the Most Intelligent Frontier AI Models, Runs 10x Faster to Deliver 1/10 the Token Cost on NVIDIA Blackwell NVL72
Reducing the number of experts per GPU : Distributing experts across up to 72 GPUs reduces the number of experts per GPU, minimizing parameter-loading pressure on each GPU’s high-bandwidth memory. Fewer experts per GPU a…









