Published 29 days ago • loading... • Updated 29 days ago

Mathematicians Test AI With Contamination-Free Problems

On February 5, 2026, eleven leading mathematicians set the First Proof challenge and released results early on Saturday on the arXiv preprint server, testing LLMs' math research ability.
To avoid training‑set leakage, organizers used unpublished research problems and encrypted answers on 1stproof.org before February 13, 2026.
Using encryption and a one‑attempt, no‑hints protocol, each participating mathematician solved their own problem, tested on GPT‑5.1 Pro and Gemini 3 Pro.
AIs produced confident solutions but only two were correct, those for the ninth and 10th problems, and none of the large language models came close to solving all ten, showing they struggle compared to mathematicians.
The team plans a second round with tighter controls and will release more details on March 14, 2026, while organizers aim to turn First Proof into a permanent benchmark for math and other domains beyond mathematics.

Insights by Ground AI

10 Articles

Leading AI models struggle to solve original math problems

Mathematics, like many other scientific endeavors, is increasingly using artificial intelligence. Of course, math is the backbone of AI, but mathematicians are also turning to these tools for tasks like literature searches and checking manuscripts for errors. But how well can AI perform when it comes to solving genuine, high-level research problems?

29 days ago·United Kingdom

Read Full Article

Scientific American

Lean Left

First Proof is AI's toughest math test yet. The results are mixed

Experts gave AI 10 math problems to solve in a week. OpenAI, researchers and amateurs all gave it their best shot

1 month ago

Read Full Article

1stproof.org

First Proof | Research-Level Math for AI Evaluation

A set of research-level mathematics questions to assess AI capabilities, with encrypted solutions.

29 days ago

Read Full Article

unu.edu

Mathematicians Create New AI Math Test With Unpublished Problems - Can AI Really Solve Research Problems? - UNU Campus Computing Centre

Eleven leading mathematicians created First Proof, a rigorous benchmark testing AI systems on unpublished research-level math problems to evaluate true capability vs. data contamination.

29 days ago

Read Full Article

arxiv.org

First Proof

To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.

29 days ago

Read Full Article

Upstract

Top AI researchers argue that AI is now more useful for mathematics thanks to the latest "reasoning" models, as math becomes a key way to test AI progress (Melissa Heikkilä/Financial Times)

Melissa Heikkilä / Financial Times: Top AI researchers argue that AI is now more useful for mathematics thanks to the latest “reasoning” models, as math becomes a key way to test AI progress — OpenAI, Google DeepMind and Anthropic seek to use advanced maths to show how capable AI models really are

30 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year