Researchers Show That Hundreds of Bad Samples Can Corrupt Any AI Model
5 Articles
5 Articles
Anthropic has discovered that few hostile texts are enough to implant a hidden door in a linguistic model, regardless of its size...
In the training of large language models (LLMs), it tends to be thought that the quality and mass quantity of data are guarantors of security. But a recent study by Anthropic, in collaboration with the UK AI Safety Institute and the Alan Turing Institute, has turned this idea upside down. Research has shown that there is no need to contaminate large amounts of data to compromise a model: just 250 malicious documents are enough to insert a functi…
Finding a bit worrying of a joint study Anthropic and the UK AI Security Institute, with the Alan Turing Institute: 250 malicious / contaminated documents would suffice to produce a back door in a LLM and thus corrupt it over time. Whether in a LLM with 13 billion parameters or a 600 million model, only the training time changes... "Our results question the perceived idea that attackers have to control a (high) percentage of training data; they …
AI models such as ChatGPT, Gemini and Claude can develop "door-to-door" vulnerabilities when corrupt documents are inserted into their training dataIn a study conducted jointly with the UK AI Security Institute and the Alan Turing Institute, Anthropic discovered that only 250 malicious documents can create a "door-to-door" vulnerability in a large language model, regardless of the size of the model or the volume of the data...
Coverage Details
Bias Distribution
- 100% of the sources lean Left
Factuality
To view factuality data please Upgrade to Premium