Attackers Can Use Poetry To Derail AI Safeguards
Human-crafted poetic prompts increased AI jailbreaking success from 8% to 62% across 25 models, revealing systemic vulnerabilities in alignment safeguards, researchers said.
4 Articles
4 Articles
Can “adversarial poetry” save us from AI?
Turns out, the Terminator movies would have been more realistic if Sarah Conner had a poetry MFA. In a new paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models”, a team of researchers have found that writing a LLM prompt in the form of an “adversarial poem” (what a phrase!) is a more efficient way to get the model to disregard its programed safety guardrails. Poetry is more powerful than we cou…
Well, friends, we already knew that LLM had a few small security loopholes, but this one is still quite... poetic. Indeed, researchers at DEXAI and Sapienza University in Rome have just discovered that reformulating a malicious request in the form of a poem allows us to bypass security in more than 90% of cases in some of the suppliers of the IA. The team has thus tested the robustness of 25 language models from 9 major suppliers: Google, OpenAI…
Coverage Details
Bias Distribution
- 100% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium



