Skip to main content
See every side of every news story
Published loading...Updated

Mathematicians Test AI With Contamination-Free Problems

The First Proof initiative tested AI on 10 unpublished math lemmas, revealing only two correct AI solutions and highlighting current AI limits in original mathematical research.

  • On February 5, 2026, eleven leading mathematicians set the First Proof challenge and released results early on Saturday on the arXiv preprint server, testing LLMs' math research ability.
  • To avoid training‑set leakage, organizers used unpublished research problems and encrypted answers on 1stproof.org before February 13, 2026.
  • Using encryption and a one‑attempt, no‑hints protocol, each participating mathematician solved their own problem, tested on GPT‑5.1 Pro and Gemini 3 Pro.
  • AIs produced confident solutions but only two were correct, those for the ninth and 10th problems, and none of the large language models came close to solving all ten, showing they struggle compared to mathematicians.
  • The team plans a second round with tighter controls and will release more details on March 14, 2026, while organizers aim to turn First Proof into a permanent benchmark for math and other domains beyond mathematics.
Insights by Ground AI

10 Articles

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/yearSubscribe

Bias Distribution

  • 50% of the sources lean Left, 50% of the sources are Center
50% Center

Factuality Info Icon

To view factuality data please Upgrade to Premium

Ownership

Info Icon

To view ownership data please Upgrade to Vantage

Scientific American broke the news in on Saturday, February 14, 2026.
Too Big Arrow Icon
Sources are mostly out of (0)

Similar News Topics

News
Feed Dots Icon
For You
Search Icon
Search
Blindspot LogoBlindspotLocal