Apple study finds cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult
APPLE RESEARCHERS' INSTITUTION, JUN 10 – Apple's study reveals advanced AI reasoning models fail completely on complex puzzles, showing zero success beyond certain difficulty thresholds despite claims of genuine step-by-step reasoning.
- On June 6, Apple published a paper titled 'The Illusion of Thinking' revealing that large reasoning models fail at complex logic tasks like Tower of Hanoi.
- The research arose from testing reasoning-optimized models on puzzles and benchmarks, showing their accuracy collapses as complexity increases despite access to correct algorithms.
- Apple found that models generate hallucinations up to 48% of the time and lack generalizable problem-solving skills, with performance dropping to zero beyond certain complexity thresholds.
- The authors found that model performance sharply declines and eventually stops altogether once problems surpass a specific complexity level, indicating that current models depend more on pattern recognition than on genuine reasoning.
- These findings challenge claims about near-term artificial general intelligence and imply fundamental limits in large reasoning models that urge more rigorous scientific analysis.
84 Articles
84 Articles
Benchmarking hallucinations: New metric tracks where multimodal reasoning models go wrong
Over the past decades, computer scientists have introduced increasingly sophisticated machine learning-based models, which can perform remarkably well on various tasks. These include multimodal large language models (MLLMs), ...


Do reasoning models really “think” or not? Apple research sparks lively debate, response
Ultimately, the big takeaway for ML researchers is that before proclaiming an AI milestone—or obituary—make sure the test itself isn’t flawed
AI Is Artificial, Not Intelligent
Will AI dominate and begin to think for itself? Apple’s recent research may have unintentionally revealed the truth many of us have long suspected: AI is artificial, but it is not intelligence. In a pre-WWDC 2025 paper, Apple exposed a fundamental flaw in the latest AI systems known as large reasoning models (LRMs). These systems — including OpenAI’s o1 and o3, DeepSeek R1, Claude 3.7 Sonnet Thinking, and Google’s Gemini Flash Thinking — demonst…
A controversial paper claims that models like ChatGPT do not think or reason, but merely imitate patterns. Researchers proved that these systems collapse in the face of complex problems.
AI flunks logic test: Multiple studies reveal illusion of reasoning
Apple researchers have uncovered a key weakness in today's most hyped AI systems – they falter at solving puzzles that require step-by-step reasoning. In a new paper, the team tested several leading models on the Tower of Hanoi, an age-old logic puzzle, and found that performance collapsed as complexity increased.Read Entire Article
New Apple study challenges whether AI models truly “reason” through problems
In early June, Apple researchers released a study suggesting that simulated reasoning (SR) models, such as OpenAI's o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, produce outputs consistent with pattern-matching from training data when faced with novel problems requiring systematic thinking. The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these …
Coverage Details
Bias Distribution
- 44% of the sources are Center
To view factuality data please Upgrade to Premium