Published 6 days ago • loading... • Updated 5 days ago
MOREH Demonstrates Production-Ready LLM Inference on Tenstorrent Galaxy, Achieving DGX A100-Class Performance with Improved Cost Efficiency
Moreh said its MoAI Inference Framework matched or surpassed NVIDIA DGX A100-class performance while cutting infrastructure costs with Tenstorrent processors.
On Friday, May 1, 2026, AI infrastructure company Moreh and CEO Gangwon Jo announced validated LLM inference performance on the Tenstorrent Galaxy Wormhole system using the proprietary MoAI Inference Framework at the TT-Deploy launch event in San Francisco.
The MoAI Inference Framework enables unified operation of heterogeneous GPUs and NPUs—including NVIDIA, AMD, and Tenstorrent chips—within a single cluster, allowing enterprises to build flexible AI infrastructure strategies without vendor lock-in.
Tests across leading Mixture-of-Experts models—including GPT-OSS, Qwen, GLM, and DeepSeek—showed the system matches or surpasses NVIDIA DGX A100-class performance, demonstrating a viable alternative to conventional GPU-centric infrastructure.
By utilizing Tenstorrent processors as dedicated prefill accelerators, the company reduced reliance on high-cost HBM, improving overall cost efficiency while maintaining production-grade stability for real-world data centers.
"Achieving production-grade LLM inference performance and stability on Tenstorrent-based systems marks a significant milestone," Jo stated, adding the company intends deeper optimization across heterogeneous architectures and closer integration with Tenstorrent NPUs.