Published 2 days ago • loading... • Updated 23 hours ago

Nvidia slaps Groq into new LPX racks for faster AI response

On Monday, Nvidia announced at GTC that it will integrate Groq 3 LPUs into its Vera Rubin NVL72 rack system, saying 'We're in production with the Groq chip'.
To speed decoding, Nvidia pairs Groq 3 LPUs as decode accelerators with Rubin GPUs so the systems jointly compute every layer for each output token, using SRAM's higher bandwidth and deploying many chips due to low per-chip capacity.
Each Groq 3 LPU delivers 1.2 petaFLOPS and 500 MB of memory, and Nvidia plans LPX racks with 256 LPUs, 128GB on-chip SRAM, 640TB/s bandwidth, with Ian Buck saying 'The tokens per second per chip, is actually quite low'.
Given steep per-chip costs, the systems are likely to be adopted first by major AI companies such as OpenAI, Anthropic, and Meta, while Nvidia wagers inference providers could charge $45 per million tokens.
Because LPUs have limited on-chip memory, Nvidia plans to ship these systems later this year with Samsung manufacturing the LPUs, and about a thousand LPUs are needed for 1 trillion-parameter models.

Insights by Ground AI

14 Articles

Can Nvidia’s inference push at GTC help or hinder China’s catch-up efforts?

Nvidia’s Groq 3 LPU chip widens the AI gap with China, but offers Chinese firms niche inference market opportunities, analysts say.

1 day ago·Hong Kong

Read Full Article

PC Mag

Lean Left

Nvidia to Upgrade AI Chatbot Performance With New 'LPU' Chip

At GTC, Nvidia announced the Groq 3 LPU chip, which uses tech licensed from the AI company Groq. The LPU was part of seven upcoming data center chips intended to supercharge AI.

2 days ago·United States

Read Full Article

The Register

Center

Nvidia slaps Groq into new LPX racks for faster AI response

GTC: GPUzilla's $20B acquihire paves to way to AI agents that halucinate faster than ever

2 days ago

Read Full Article

technewstube.com

Analysis: Is Nvidia's Groq deal the endgame for AI chip startups?

At its 2026 GTC conference, Nvidia not only unveiled its Vera CPU but also officially launched the Groq 3 LPU chip, developed through a prior technology licensing arrangement with Groq and brought into its own ecosystem. Alongside it, Nvidia introduced the Groq 3 LPX platform - a server rack composed of 128 Groq 3 LPUs that can be directly integrated with the Vera Rubin solution. The move signals that Nvidia has successfully absorbed Groq's tech…

23 hours ago

Read Full Article

ServeTheHome

Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference

With its upcoming Vera Rubin rackscale architecture, NVIDIA is going to be integrating LPUs from acquihire Groq, marking a major expansion beyond using GPUs alone for AI inference The post Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference appeared first on ServeTheHome.

1 day ago

Read Full Article

the-decoder.com

GTC 2026: With Groq 3 LPX, Nvidia adds dedicated inference hardware to its platform for the first time

At GTC 2026, Nvidia expanded the Vera Rubin platform it introduced at CES with custom CPU racks, dedicated inference chips, a new storage architecture, an inference operating system, open model alliances, and agent security software. The article GTC 2026: With Groq 3 LPX, Nvidia adds dedicated inference hardware to its platform for the first time appeared first on The Decoder.

1 day ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year