Nvidia slaps Groq into new LPX racks for faster AI response
Nvidia’s integration of 256 Groq 3 LPUs with Vera Rubin racks aims to boost large language model inference throughput up to 35× on trillion-parameter models.
- On Monday, Nvidia announced at GTC that it will integrate Groq 3 LPUs into its Vera Rubin NVL72 rack system, saying 'We're in production with the Groq chip'.
- To speed decoding, Nvidia pairs Groq 3 LPUs as decode accelerators with Rubin GPUs so the systems jointly compute every layer for each output token, using SRAM's higher bandwidth and deploying many chips due to low per-chip capacity.
- Each Groq 3 LPU delivers 1.2 petaFLOPS and 500 MB of memory, and Nvidia plans LPX racks with 256 LPUs, 128GB on-chip SRAM, 640TB/s bandwidth, with Ian Buck saying 'The tokens per second per chip, is actually quite low'.
- Given steep per-chip costs, the systems are likely to be adopted first by major AI companies such as OpenAI, Anthropic, and Meta, while Nvidia wagers inference providers could charge $45 per million tokens.
- Because LPUs have limited on-chip memory, Nvidia plans to ship these systems later this year with Samsung manufacturing the LPUs, and about a thousand LPUs are needed for 1 trillion-parameter models.
14 Articles
14 Articles
Analysis: Is Nvidia's Groq deal the endgame for AI chip startups?
At its 2026 GTC conference, Nvidia not only unveiled its Vera CPU but also officially launched the Groq 3 LPU chip, developed through a prior technology licensing arrangement with Groq and brought into its own ecosystem. Alongside it, Nvidia introduced the Groq 3 LPX platform - a server rack composed of 128 Groq 3 LPUs that can be directly integrated with the Vera Rubin solution. The move signals that Nvidia has successfully absorbed Groq's tech…
Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference
With its upcoming Vera Rubin rackscale architecture, NVIDIA is going to be integrating LPUs from acquihire Groq, marking a major expansion beyond using GPUs alone for AI inference The post Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference appeared first on ServeTheHome.
GTC 2026: With Groq 3 LPX, Nvidia adds dedicated inference hardware to its platform for the first time
At GTC 2026, Nvidia expanded the Vera Rubin platform it introduced at CES with custom CPU racks, dedicated inference chips, a new storage architecture, an inference operating system, open model alliances, and agent security software. The article GTC 2026: With Groq 3 LPX, Nvidia adds dedicated inference hardware to its platform for the first time appeared first on The Decoder.
Coverage Details
Bias Distribution
- 67% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium










