Published 2 years ago • loading... • Updated 2 years ago

Cerebras Inference: Llama3.1-70B Hits 2,100 Tokens/s

In a significant development, Cerebras Systems has announced a major update to its Cerebras Inference platform, marking the most substantial enhancement since its inception. This update enables Cerebras Inference to run the Llama 3.1-70B model at an impressive speed of 2,100 tokens per second, which represents a threefold increase in performance compared to previous versions. […]

This story is only covered by news sources that have yet to be evaluated by the independent media monitoring agencies we use to assess the quality and reliability of news outlets on our platform. Learn more here.

2 Articles

01net

Cerebras Triples its Industry-Leading Inference Performance, Setting New All Time Record

Cerebras Inference delivers 2,100 tokens/second for Llama 3.2B 70B — 16X performance of the fastest GPUs and 68x faster than hyperscale clouds SUNNYVALE, Calif.–(BUSINESS WIRE)–Today, Cerebras Systems, the pioneer in high performance AI compute, smashed its previous industry record for inference, delivering 2,100 tokens/second performance on Llama 3.2 70B. This is 16x faster than any […] L'articolo Cerebras Triples its Industry-Leading Inference…

2 years ago

Read Full Article