OpenAI Sidesteps Nvidia with Unusually Fast Coding Model on Plate-Sized Chips
OpenAI's GPT-5.3-Codex-Spark runs code up to 15 times faster on Cerebras wafer-scale chips, initially for Pro-tier users to enhance real-time coding workflows.
- On Thursday, OpenAI launched GPT-5.3-Codex-Spark, a stripped-down coding model running on Cerebras Systems processors that delivers more than 1000 tokens per second to Codex Pro users and select partners.
- To diversify suppliers, OpenAI last month signed a $10 billion contract with Cerebras to deploy up to 750 megawatts as it reduces dependence on Nvidia.
- Technically, Wafer Scale Engine 3 has 4 trillion transistors and uses SRAM roughly 1,000x faster than HBM4, but its 44 GB memory and 128,000-token window limit Codex-Spark's multimodal capacity and benchmark performance.
- OpenAI rolled out Persistent WebSocket connections and Responses API optimizations with a separate rate limit during the research preview, and will expand access over the coming weeks.
- Strategically, the move signals OpenAI’s effort to reshape inference latency economics by diversifying suppliers beyond Nvidia amid internal turmoil after disbanding its mission alignment team of seven members, while productivity gains remain contested as competitors like Anthropic intensify pressure.
22 Articles
22 Articles
OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips
On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware, deploying the new GPT-5.3-Codex-Spark coding model on chips from Cerebras. The model delivers code at more than 1,000 tokens (chunks of data) per second, which is reported to be roughly 15 times faster than its predecessor. To compare, Anthropic's Claude Opus 4.6 in its new premium-priced fast mode reaches about 2.5 times its standard speed of 68.2 tokens p…
OpenAI's new Codex Spark model is built for speed
OpenAI’s new GPT-5.3-Codex-Spark model is a bit of a departure for the company’s family of Codex software development models: its focus is squarely on reducing latency. Powered by Cerebras’ 125-petaflop Wafer Scale Engine 3, the Codex Spark model is meant for use cases where latency matters as much — or more — than intelligence. And fast it is: Codex Spark can deliver more than 1,000 tokens per second. When OpenAI launched GPT-5.3-Codex only a f…
OpenAI deploys Cerebras chips for 'near-instant' code generation in first major move beyond Nvidia
OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads.The partnership arrives at a pivotal moment for OpenAI. …
OpenAI launches GPT-5.3-Codex-Spark, optimized by the Wafer Scale Engine 3 chip of Cerebras, for ultra-fast real-time coding, 15 times faster than its predecessorOpenAI launches GPT-5.3-Codex-Spark for ultra-fast real-time coding. Powered by Cerebras' Wafer Scale Engine 3 chip, Spark would allow for faster inference and is the first important step in the multi-year partnership between OpenAI and Cerebras. The original GPT-5.3-Codex model serves …
Coverage Details
Bias Distribution
- 100% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium







