Skip to main content
See every side of every news story
Published loading...Updated

Nvidia slaps Groq into new LPX racks for faster AI response

Nvidia’s integration of 256 Groq 3 LPUs with Vera Rubin racks aims to boost large language model inference throughput up to 35× on trillion-parameter models.

  • On Monday, Nvidia announced at GTC that it will integrate Groq 3 LPUs into its Vera Rubin NVL72 rack system, saying 'We're in production with the Groq chip'.
  • To speed decoding, Nvidia pairs Groq 3 LPUs as decode accelerators with Rubin GPUs so the systems jointly compute every layer for each output token, using SRAM's higher bandwidth and deploying many chips due to low per-chip capacity.
  • Each Groq 3 LPU delivers 1.2 petaFLOPS and 500 MB of memory, and Nvidia plans LPX racks with 256 LPUs, 128GB on-chip SRAM, 640TB/s bandwidth, with Ian Buck saying 'The tokens per second per chip, is actually quite low'.
  • Given steep per-chip costs, the systems are likely to be adopted first by major AI companies such as OpenAI, Anthropic, and Meta, while Nvidia wagers inference providers could charge $45 per million tokens.
  • Because LPUs have limited on-chip memory, Nvidia plans to ship these systems later this year with Samsung manufacturing the LPUs, and about a thousand LPUs are needed for 1 trillion-parameter models.
Insights by Ground AI

14 Articles

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/yearSubscribe

Bias Distribution

  • 67% of the sources are Center
67% Center

Factuality Info Icon

To view factuality data please Upgrade to Premium

Ownership

Info Icon

To view ownership data please Upgrade to Vantage

se7en.ws broke the news in on Monday, March 16, 2026.
Too Big Arrow Icon
Sources are mostly out of (0)

Similar News Topics

News
Feed Dots Icon
For You
Search Icon
Search
Blindspot LogoBlindspotLocal