Published 1 day ago • loading... • Updated 7 hours ago

OpenAI Sidesteps Nvidia with Unusually Fast Coding Model on Plate-Sized Chips

On Thursday, OpenAI launched GPT-5.3-Codex-Spark, a stripped-down coding model running on Cerebras Systems processors that delivers more than 1000 tokens per second to Codex Pro users and select partners.
To diversify suppliers, OpenAI last month signed a $10 billion contract with Cerebras to deploy up to 750 megawatts as it reduces dependence on Nvidia.
Technically, Wafer Scale Engine 3 has 4 trillion transistors and uses SRAM roughly 1,000x faster than HBM4, but its 44 GB memory and 128,000-token window limit Codex-Spark's multimodal capacity and benchmark performance.
OpenAI rolled out Persistent WebSocket connections and Responses API optimizations with a separate rate limit during the research preview, and will expand access over the coming weeks.
Strategically, the move signals OpenAI’s effort to reshape inference latency economics by diversifying suppliers beyond Nvidia amid internal turmoil after disbanding its mission alignment team of seven members, while productivity gains remain contested as competitors like Anthropic intensify pressure.

Insights by Ground AI

Podcasts & Opinions

Podcast Mention

TBPN Live

Daily Technology, Business and Entrepreneur podcast

Something Mini is Coming, Anfropic's Round, Huberman Joins, Bryan Johnson IRL, 100 Year Bonds

TBPN Live discuss OpenAI’s GPT-5.3-Codex-Spark debut on Cerebras and its fast, Pro-tier rollout

23 hours ago

Listen to Full Episode Full Episode Unlock Timestamp

Get Vantage — Podcasts, Ratings, Timestamps

Podcasts & Opinions

22 Articles

Ars Technica

Center

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware, deploying the new GPT-5.3-Codex-Spark coding model on chips from Cerebras. The model delivers code at more than 1,000 tokens (chunks of data) per second, which is reported to be roughly 15 times faster than its predecessor. To compare, Anthropic's Claude Opus 4.6 in its new premium-priced fast mode reaches about 2.5 times its standard speed of 68.2 tokens p…

23 hours ago·United States

Read Full Article

The Register

Center

OpenAI unveils first model running on Cerebras silicon

: GPT-5.3-Codex-Spark may be a mouthfull, but it's certainly fast at 1,000 Tok/s running on Nvidia rival's CS3 accelerators

23 hours ago

Read Full Article

ZDNet

Center

OpenAI's new Spark model codes 15x faster than GPT-5.3-Codex - but there's a catch

OpenAI's new GPT-5.3-Codex-Spark promises ultra-fast, conversational AI coding, if you can tolerate a few trade-offs.

1 day ago·United States

Read Full Article

The New Stack

Center

OpenAI's new Codex Spark model is built for speed

OpenAI’s new GPT-5.3-Codex-Spark model is a bit of a departure for the company’s family of Codex software development models: its focus is squarely on reducing latency. Powered by Cerebras’ 125-petaflop Wafer Scale Engine 3, the Codex Spark model is meant for use cases where latency matters as much — or more — than intelligence. And fast it is: Codex Spark can deliver more than 1,000 tokens per second. When OpenAI launched GPT-5.3-Codex only a f…

1 day ago

Read Full Article

VentureBeat

Center

OpenAI deploys Cerebras chips for 'near-instant' code generation in first major move beyond Nvidia

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads.The partnership arrives at a pivotal moment for OpenAI. …

1 day ago·San Francisco, United States

Read Full Article

Developpez.com

Openai Launches Gpt-5.3-Codex-Spark~? Optimized by the Wafer Scale Engine 3 Chip From Cerebras~? for Ultra-Fast Real-Time Coding~? 15 Times Faster than Its Predecessor

OpenAI launches GPT-5.3-Codex-Spark, optimized by the Wafer Scale Engine 3 chip of Cerebras, for ultra-fast real-time coding, 15 times faster than its predecessorOpenAI launches GPT-5.3-Codex-Spark for ultra-fast real-time coding. Powered by Cerebras' Wafer Scale Engine 3 chip, Spark would allow for faster inference and is the first important step in the multi-year partnership between OpenAI and Cerebras. The original GPT-5.3-Codex model serves …

7 hours ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year