How Nvidia’s $20 billion Groq 3 LPU deal reshapes the Nvidia Vera Rubin Platform — Samsung 4nm process serves as bedrock for SRAM-based AI accelerator chip

How Nvidia's $20 billion Groq 3 LPU deal reshapes the Nvidia Vera Rubin Platform — Samsung 4nm process serves as bedrock for SRAM-based AI accelerator chip

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works .

The LP30 chip at the heart of the Groq 3 LPX rack carries 512 MB of on-chip SRAM per die, delivering 150 TB/s of memory bandwidth. That figure dwarfs the 22 TB/s available from the 288 GB of HBM4 on each Rubin GPU. A full LPX rack houses 256 LPUs for a total of 128GB of SRAM and 40 PB/s of aggregate bandwidth. Nvidia claims the LPX rack, paired with a Vera Rubin NVL72, delivers 35 times higher throughput per megawatt than Blackwell NVL72 alone for trillion-parameter models, at a target price point of $45 per million tokens.

Rubin GPUs handle the compute-intensive prefill phase of a query, processing long input contexts, while Groq LPUs take over the decode phase, generating output tokens at low latency. Nvidia's Dynamo orchestration platform manages the split across heterogeneous hardware, distributing workloads based on batch size and parallelism requirements.

You may like Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC — SRAM-packed accelerator boosts 'every layer of the AI model on every token' Nvidia's $20 billion Groq IP deal bolsters AI market domination Nvidia removes Rubin CPX accelerators from its roadmap The original, pre-Nvidia Groq LPU design used a fixed Very Long Instruction Word (VLIW) pipeline and large on-chip SRAM pools, with the compiler pre-scheduling the entire execution path at compile time, which meant deterministic latency with no cache misses or stalls. These chips also demonstrated raw single-user token rates in the thousands per second, but the architecture's weakness was always capacity. At 230MB of SRAM per chip in prior generations, fitting even medium-sized models required high chip counts, and the architecture was initially designed for convolutional neural networks.

The Groq LP30 addresses some of these limitations with 512 MB of SRAM per die and 1.23 FP8 PFLOPS of compute capability. Samsung has ramped production from roughly 9,000 wafers to about 15,000 wafers as output shifts from samples to commercial manufacturing, with AWS announcing at GTC that it will deploy Groq 3 LPUs alongside more than one million Nvidia GPUs as part of an expanded partnership.

Beyond the LP30, a future LP35 will add NVFP4 support, aligning with the Rubin Ultra generation, and an LP40 is planned for the Feynman architecture cycle after that.

One conspicuous absence from GTC was the Rubin CPX, a GDDR7-based inference accelerator announced in September 2025 as part of the Vera Rubin platform; it was absent from all keynote slides and received no stage time. It appears — though not officially confirmed — that the CPX has been removed from Nvidia’s roadmap entirely, replaced in the platform hierarchy by Groq 3 LPX.

Rubin CPX was designed to use cheaper, more available GDDR7 memory to accelerate the context phase of inference at lower power. But the Groq LPU offers higher bandwidth without requiring large quantities of any external memory, which is ideal in a market where HBM supply remains constrained, and GDDR7 production is still scaling. Off-roadmap parts could still ship to customers who have already invested in CPX software optimization, but there’s a clear shift in priorities at Nvidia.

There’s also an uncanny comparison between this and the Mellanox acquisition in 2019 . That ended up turning Nvidia’s NVLink and InfiniBand technologies into foundational infrastructure for AI clusters. Groq appears to be following a similar trajectory, whereby start-up tech is absorbed into the platform as a permanent new architectural layer.

Nvidia's Groq deal is the largest in a wave of inference-focused acquisitions that swept through the semiconductor industry in 2025. In June, AMD acquired the engineering team from Untether AI , a RISC-V inference chip developer, after the startup shut down, and Nvidia itself paid over $900 million for networking startup Enfabrica's team and IP in September. Meta acquired custom-chip startup Rivos in October, and Intel attempted to buy SambaNova for a reported $1.6 billion, but the talks collapsed; the two companies settled on a $350 million investment and multi-year partnership last month instead.

Examining Nvidia's 60 exaflop Vera Rubin POD — how seven chips underpin company's 40 rack AI factory supercomputer

Nvidia buys AI chip rival Groq's IP for $20 billion in its biggest deal ever

Key considerations

Investor positioning can change fast
Volatility remains possible near catalysts
Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Key considerations

Reference reading

More on this site

Related posts:

Leave a Comment Cancel reply