AI Models More Vulnerable than Claimed when Faced with Iterative Attacks
5 Articles
5 Articles
Multi-Turn Attacks Expose Ongoing Weaknesses Across Frontier AI Models
A Cisco evaluation of frontier LLMs found that no tested model consistently resisted multi-turn adversarial attacks, raising concerns about current AI safety assessments. The research suggests that many widely used AI safety benchmarks may underestimate real-world risk because they focus primarily on single-turn prompt evaluations rather than adaptive, iterative attacks. Key Takeaways from Cisco’s Research Cisco found that every tested frontie…
Frontier AI models collapse under multi-turn AI attacks, Cisco finds
Attackers who probe large language models rarely give up after one refusal. They reframe, build context across turns, adopt personas, and escalate gradually. New research from Cisco’s AI threat intelligence team finds that the safety benchmarks used across the industry miss almost all of this behavior, and the gap between published scores and observed resilience runs wide enough to misrank leading models. Single-turn versus multi-turn ASR by mod…
AI models more vulnerable than claimed when faced with iterative attacks
CISOs relying on LLM runtime guardrails and official safety scores when making security decisions about their organizations’ AI usage and model selection are due for a wakeup call. According to a new study from Cisco, frontier models from OpenAI, Anthropic, Google, xAI, and Amazon have significantly worse risk profiles when pressured in multi-turn attacks compared to when their safety is benchmarked using single prompts. “The dominant safety ben…
Cisco research finds standard AI safety benchmarks miss the real threat
Enterprises deploying closed AI models have generally relied on published safety benchmarks to assess risk before procurement and deployment decisions. New research from Cisco’s AI Threat Intelligence and Security Research team finds those benchmarks may systematically understate the threat. Standard safety tests submit a single adversarial prompt and record the model’s response. Multi-turn attacks work differently. An attacker maintains a conve…
Coverage Details
Bias Distribution
- There is no tracked Bias information for the sources covering this story.
Factuality
To view factuality data please Upgrade to Premium


