Mathematicians Test AI With Contamination-Free Problems
The First Proof initiative tested AI on 10 unpublished math lemmas, revealing only two correct AI solutions and highlighting current AI limits in original mathematical research.
- On February 5, 2026, eleven leading mathematicians set the First Proof challenge and released results early on Saturday on the arXiv preprint server, testing LLMs' math research ability.
- To avoid training‑set leakage, organizers used unpublished research problems and encrypted answers on 1stproof.org before February 13, 2026.
- Using encryption and a one‑attempt, no‑hints protocol, each participating mathematician solved their own problem, tested on GPT‑5.1 Pro and Gemini 3 Pro.
- AIs produced confident solutions but only two were correct, those for the ninth and 10th problems, and none of the large language models came close to solving all ten, showing they struggle compared to mathematicians.
- The team plans a second round with tighter controls and will release more details on March 14, 2026, while organizers aim to turn First Proof into a permanent benchmark for math and other domains beyond mathematics.
10 Articles
10 Articles
Leading AI models struggle to solve original math problems
Mathematics, like many other scientific endeavors, is increasingly using artificial intelligence. Of course, math is the backbone of AI, but mathematicians are also turning to these tools for tasks like literature searches and checking manuscripts for errors. But how well can AI perform when it comes to solving genuine, high-level research problems?
Mathematicians Create New AI Math Test With Unpublished Problems - Can AI Really Solve Research Problems? - UNU Campus Computing Centre
Eleven leading mathematicians created First Proof, a rigorous benchmark testing AI systems on unpublished research-level math problems to evaluate true capability vs. data contamination.
First Proof
To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.
Top AI researchers argue that AI is now more useful for mathematics thanks to the latest "reasoning" models, as math becomes a key way to test AI progress (Melissa Heikkilä/Financial Times)
Melissa Heikkilä / Financial Times: Top AI researchers argue that AI is now more useful for mathematics thanks to the latest “reasoning” models, as math becomes a key way to test AI progress — OpenAI, Google DeepMind and Anthropic seek to use advanced maths to show how capable AI models really are
Coverage Details
Bias Distribution
- 50% of the sources lean Left, 50% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium



