Published 12 days ago • loading... • Updated 10 days ago

New study accuses LM Arena of gaming its popular AI benchmark

Summary by Ars Technica

The popular AI vibe test may not be as fair as it seems.

7 Articles

All

Left

Center

Right

Ars Technica

Reposted by

technewstube.com

Center

New study accuses LM Arena of gaming its popular AI benchmark

The popular AI vibe test may not be as fair as it seems.

11 days ago·United States

Read Full Article

TechCrunch

Center

Study accuses LM Arena of helping top AI labs game its benchmark

A new study accuses LM Arena, the organization behind the popular AI benchmark Chatbot Arena, of helping some AI companies game its leaderboard.

12 days ago·United States

Read Full Article

BetaKit

Cohere Labs head calls “unreliable” AI leaderboard rankings a “crisis” in the field

The head of Cohere’s research division is concerned that alleged unreliability in the rankings of a popular chatbot leaderboard amounts to a “crisis” in artificial intelligence (AI) development. A new study co-authored by Sara Hooker, head of Cohere Labs, along with researchers at Cohere and leading universities, claims that large AI companies have been “gaming” the crowd-sourced chatbot ranking platform LM Arena to boost the ranking of their l…

10 days ago

Read Full Article

Fredzone

These accusations of favouritism undermine the reputation of LM Arena...

A study conducted by researchers from Cohere, Stanford, MIT and Ai2 accuses LM Arena, the organisation behind the benchmark AI Chatbot Arena, of discriminatory practices for the benefit of certain giants in the sector. According to the authors, companies like Meta, OpenAI, Google and Amazon would have benefited from privileged access allowing to test several variants of their ... Read more The article These accusations of favouritism are taintin…

11 days ago

Read Full Article

the-decoder.com

Popular AI benchmark LMArena allegedly favors large providers, study claims

Researchers say the ranking system favors major providers like OpenAI, Google, and Meta. LMArena disputes the claims. The article Popular AI benchmark LMArena allegedly favors large providers, study claims appeared first on THE DECODER.

11 days ago

Read Full Article

slashdot.org

Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark

An anonymous reader shares a report: A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According...

12 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year