Nvidia launches BlueField-4 STX storage architecture for agentic AI at GTC 2026

Nvidia launches BlueField-4 STX storage architecture for agentic AI at GTC 2026

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works .

Nvidia announced BlueField-4 STX at GTC 2026 on March 16, a modular reference architecture for accelerated storage designed to address the data access bottleneck limiting agentic AI inference.

Built around a new storage-optimized BlueField-4 DPU and ConnectX-9 SuperNIC, the platform targets GPU underutilization that occurs when AI agents operating across extended sessions and expanding context windows exceed the throughput of conventional storage paths. Nvidia says STX delivers up to five times the token throughput, four times better energy efficiency, and twice the page ingestion speed compared with traditional CPU-based storage architectures.

The specific issue that Nvidia is targeting with STX is KV cache management. During transformer inference, the attention mechanism computes KV pairs for every token in context, which must be stored and retrieved for each subsequent generation step. But these context windows are growing into the hundreds of thousands of tokens, meaning that the KV cache is outgrowing GPU HBM capacity. The usual fallback is to offload to host DRAM or NVMe storage, but both routes pass through the CPU, adding latency that compounds with context length and stalls GPU execution as data transits.

You may like Nvidia launches Vera Rubin NVL72 AI supercomputer at CES Nvidia's focus on rack-scale AI systems is a portent for the year to come Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC — SRAM-packed accelerator boosts 'every layer of the AI model on every token' STX bypasses the host CPU by routing data through a dedicated accelerated storage layer via RDMA over Spectrum-X Ethernet. BlueField-4 manages NVMe SSDs directly and handles data integrity and encryption for the KV cache, keeping context accessible at the storage processor rather than transit­ing the host. The full stack runs on the Vera Rubin platform and integrates the Vera CPU — also announced at GTC on March 16 — alongside ConnectX-9, Spectrum-X Ethernet, DOCA software, and AI Enterprise software. The first rack-scale implementation built on STX is the Nvidia CMX context memory storage platform.

Storage and infrastructure vendors co-designing systems based on STX include DDN, Dell Technologies, HPE, IBM , NetApp, and VAST Data, alongside manufacturing partners AIC, Supermicro, and Quanta Cloud Technology. Meanwhile, eight cloud and AI providers — including CoreWeave, Lambda, Mistral AI, and Oracle Cloud Infrastructure — committed to early adoption for context memory storage. STX-based platforms are expected from partners in the second half of 2026.

"Agentic AI is redefining what software can do — and the computing infrastructure behind it must be reinvented to keep pace," Jensen Huang, founder and CEO of Nvidia, said at GTC. "AI systems that reason across massive context and continuously learn require a new class of storage."

Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Key considerations

  • Investor positioning can change fast
  • Volatility remains possible near catalysts
  • Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Leave a Comment