NVIDIA Blackwell Ultra sets new MLPerf inference records with DeepSeek and Llama 3.1

With LLMs becoming bigger and smarter over time, this just means that they chew through way more compute. It’s not just about size either, because today’s models like to “think out loud” with tons of intermediate reasoning tokens before giving you an answer. Put those two trends together, and the demand for raw performance skyrockets.

But it seems that NVIDIA’s Blackwell Ultra architecture, which powers the super flagship GB300 NVL72 system, is still capable of tackling them, as shown through the latest MLPerf Inference v5.1 results. This round of benchmarks included some new heavyweights like DeepSeek-R1 with its massive 671B parameter MoE architecture, Llama 3.1 in both 405B and 8B variants, and Whisper, which just replaced RNN-T after blowing up on HuggingFace with nearly 5 million downloads in a month.

NVIDIA not only submitted results with its Blackwell GPUs, but also debuted the new Blackwell Ultra architecture, and it smashed records across the board.

On DeepSeek-R1, the GB300 NVL72 system delivered up to 45% higher performance per GPU compared to its own GB200 predecessor, and about 5x the throughput over Hopper-based systems. That kind of jump translates to much lower cost per token and higher AI factory output. A lot of it came down to clever optimizations like squeezing weights into NVFP4 four-bit floating point for higher throughput, converting KV caches to FP8 to shrink memory usage, and introducing new parallelism strategies that keep every GPU busy without creating bottlenecks.

Llama 3.1 also pushed things further, especially the new 405B interactive benchmark, where time-to-first-token and tokens-per-user demands are even tighter. To hit those numbers, NVIDIA leaned on techniques like disaggregated serving and NVLink-powered all-to-all GPU communication, unlocking nearly 1.5x better throughput per GPU compared to older setups.

When you put it all together, Blackwell Ultra is showing not just incremental gains but architectural leaps such as higher memory capacity, stronger attention compute, and smarter software stacks like TensorRT-LLM and CUDA Graphs that make every cycle count.

Click here to check out more numbers if you’re interested.

Calvin Liew

Ex-competitive rhythm gamer who is always the "Good but not the best". You'd know me as Vindy if you know where to look. Currently on a quest to own enough keyboards with different plates and just slapping MX Black on them.

3D objection generation tech now augmented with NVIDIA AI Blueprint for super fast buildup

by Calvin Liew

September 9, 2025

NVIDIA has introduced its AI Blueprint for 3D object generation, giving artists a streamlined way to create up to 20...

News

NVIDIA updates its Jetson lineup with Blackwell-powered AGX Thor for the next wave of robotics

by Calvin Liew

August 28, 2025

NVIDIA has launched Jetson AGX Thor, a next-gen robotics computer delivering 7.5x more AI compute than its predecessor and already...

Subscribe via Email

Enterprise News

NVIDIA Blackwell Ultra sets new MLPerf inference records with DeepSeek and Llama 3.1

Calvin Liew

Leave a ReplyCancel reply

Delta Electronics @ MARVEX 2025: Future-ready building automation from top to bottom

Related Posts

3D objection generation tech now augmented with NVIDIA AI Blueprint for super fast buildup

NVIDIA updates its Jetson lineup with Blackwell-powered AGX Thor for the next wave of robotics

NVIDIA RTX PRO servers are one of the new AI toys with huge industry backing from Disney, Hyundai, TSMC, and more

NVIDIA is now capable of linking data centers into giga-scale AI super-factories with its new Spectrum-XGS Ethernet

NVIDIA @ Gamescom 2025: No new hardware, plenty of current and upcoming games to come with DLSS, and more

ASUS Celebrates 30 Years with the ROG Matrix GeForce RTX 5090

Subscribe via Email

Delta Electronics @ MARVEX 2025: Future-ready building automation from top to bottom

3D objection generation tech now augmented with NVIDIA AI Blueprint for super fast buildup

ASUS ROG CROSSHAIR X870E APEX Hands-On Review