Published 1 month ago • loading... • Updated 1 month ago

Just 250 Documents Can Poison AI Models, Study Finds

Anthropic released a report today detailing how just 250 malicious documents can introduce a backdoor vulnerability in large language models regardless of model size.
Researchers from Anthropic, in partnership with leading UK organizations focused on AI safety and research, set out to dispute the idea that attackers need access to a certain proportion of training data to carry out successful data-poisoning attacks.
Researchers found that introducing a fixed, small number of poisoned documents enables adversaries to trigger specific hidden behaviors using certain phrases, even in models with up to 13 billion parameters.
The report highlights that only 250 malicious documents—just 0.00016% of training tokens—were needed to compromise a large model, indicating that executing data poisoning attacks against large-scale AI systems might require less effort than was previously assumed.
Anthropic cautions that data-poisoning attacks seem more feasible than once assumed and urges further research on defenses, while also emphasizing ongoing safety measures in real-world systems.

Insights by Ground AI

Podcasts & Opinions

Podcast Mention

Center

Techmeme Ride Home

Daily Tech and News podcast

Center

Daily Tech and News podcast

China’s Getting Tetchy Again

Techmeme Ride Home discuss Anthropic's study on AI poisoning vulnerabilities

1 month ago

Listen to Full Episode Full Episode Unlock Timestamp

Get Vantage — Podcasts, Ratings, Timestamps

Podcasts & Opinions

40 Articles

20minutos

Center

Alert: Manage to Infect Ai by Training It with only 250 Documents with Malware

An investigation shows that the main AI models can be manipulated with only 250 corrupt documents, which endangers their training.

1 month ago·Madrid, Spain

Read Full Article

Ars Technica

Center

AI models can acquire backdoors from surprisingly few malicious documents

Scraping the open web for AI training data can have its drawbacks. On Thursday, researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute released a preprint research paper suggesting that large language models like the ones that power ChatGPT, Gemini, and Claude can develop backdoor vulnerabilities from as few as 250 corrupted documents inserted into their training data. That means someone tucking certain documents…

1 month ago·United States

Read Full Article

The Register

Center

Data quantity doesn't matter when poisoning an LLM

: Just 250 malicious training documents can poison a 13B parameter model - that's 0.00016% of a whole dataset

1 month ago

Read Full Article

Engadget

Reposted by

OODA Loop

Lean Left

Researchers find just 250 malicious documents can leave LLMs vulnerable to backdoors

New research has found that a small and fairly constant number of malicious documents can poison an LLM and create a backdoor.

1 month ago·United States

Read Full Article

TechXplore

Center

Size doesn't matter: Just a small number of malicious files can corrupt LLMs of any size

Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to research by Anthropic, the UK AI Security Institute and the Alan Turing Institute, it only takes ...

2 months ago

Read Full Article

Nature

Center

Medical large language models are vulnerable to data-poisoning attacks - Nature Medicine

The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular datase…

2 months ago·United Kingdom

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year