Published 12 days ago • loading... • Updated 10 days ago

Meta accused of Llama 4 bait-n-switch to juice LMArena rank

Summary by The Register

: Did Facebook giant rizz up LLM to win over human voters? It appears so

6 Articles

All

Left

Center

Right

The Register

Center

Meta accused of Llama 4 bait-n-switch to juice LMArena rank

: Did Facebook giant rizz up LLM to win over human voters? It appears so

11 days ago

Read Full Article

Startups News | Tech News

Lean Right

Llama 4 Scandal: Meta’s release of Llama 4 overshadowed by cheating allegations on AI benchmark - Tech Startups

Meta rolled out its much-hyped Llama 4 models this past weekend, touting big performance gains and new multimodal capabilities. But the rollout hasn’t gone as planned. What was supposed to mark a new chapter in Meta’s AI playbook is now […] The post Llama 4 Scandal: Meta’s release of Llama 4 overshadowed by cheating allegations on AI benchmark first appeared on Tech Startups.

11 days ago

Read Full Article

Developpez.com

Meta has faked the tests to give the impression that its new IA Llama 4 model is better than competition, which puts more into question the relevance of the results of the IA benchmarks

Meta recently launched the model family Llama 4: Llama 4 Scout, Llama 4 Maverick and Llama 4 Behemoth. The company stated that each of these models is the best in its category. For example, Meta stated that Llama 4 Maverick offers the best performance/cost ratio in its category. However, it appears that there has been an entourage during the evaluation process: the version of Maverick tested on LMArena is not the same as the version made availab…

10 days ago

Read Full Article

IT PRO

Meta executive denies hyping up Llama 4 benchmark scores – but what can users expect from the new models?

A senior figure at Meta has denied claims that the tech giant boosted performance metrics for its new Llama 4 AI model range following rumors online.

11 days ago

Read Full Article

CIO

AI 성능 평가, 벤치마크만으론 부족하다··· 메타 라마4 논란이 보여준 실사용 검증의 필요성

AI 성능을 평가할 때 벤치마크는 핵심적인 기준이 된다. 모델의 신뢰성, 정확도, 활용 범위 등 다양한 측면에서 장단점을 파악할 수 있는 수단이기 때문이다. 하지만 메타가 새로운 생성형AI 모델인 라마4(Llama 4)의 성능을 부풀렸다는 의혹이 제기되면서, 최근 AI 벤치마크 결과의 정확성과 타당성에 대한 경각심이 커지고 있다. 특히 모델 개발자가 특정 벤치마크에 유리하도록 알고리즘을 조정하는 경우가 많아, 그 신뢰성에 의문이 제기되고 있다. IDC의 AI 및 자동화 부문 리서치 부사장 데이브 슈브멜은 “조직은 각자 모델 성능 주장을 직접 검증해야 한다”라며 “실제 운영 환경이나 데이터, 프롬프트의 차이만으로도 결과는 충분히 달라질 수 있다”라고 설명했다. 결과 조작 가능성에도 IT 구매자는 여전히 주목 지난 토요일, 메타는 새로운 라마 모델 ‘스카우트(Scout)’와 ‘매버릭(Maveric…

11 days ago

Read Full Article

TechnoCodex

Llama 4 Maverick Benchmark Scandal: Meta Slammed

April 8, 2025 – Meta’s Llama 4 Maverick AI model, touted as a best-in-class multimodal AI, is at the center of a growing controversy after researchers discovered that its high benchmark scores on LM Arena were achieved using a fine-tuned “experimental chat version” not available to developers. The revelation, following Meta’s April 5, 2025, launch of the Llama 4 models, has led to accusations of benchmark manipulation and calls for greater trans…

12 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year