Researchers Say AI Just Broke Every Benchmark for Autonomous Cyber Capability
3 Articles
3 Articles
Researchers say AI just broke every benchmark for autonomous cyber capability
Two of the most advanced artificial intelligence models — Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 — have significantly surpassed the already-accelerating pace at which AI systems are completing autonomous cybersecurity tasks, according to separate findings published Wednesday by the United Kingdom’s AI Security Institute (AISI) and Palo Alto Networks. The AISI, which conducts pre-deployment evaluations of frontier AI models on beh…
OpenAI's GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities - Schneier on Security
The UK’s AI Security Institute evaluated GPT-5.5’s ability to find security vulnerabilities, and found that it is comparable to Claude Mythos. Note that the OpenAI model is generally available. Here is the Institute’s evaluation of Mythos. And here is an analysis of a smaller, cheaper model. It requires more scaffolding from the prompter, but it is also just as good.
Coverage Details
Bias Distribution
- 100% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium