"Vintage LLMs" Could Open a New Field of Historical Research
The model uses pre-1931 scans to avoid copyright problems and to test whether historical language models can support new research, its creators said.
- On Monday, researchers released Talkie-1930, a 13B parameter language model trained exclusively on English-language text published before 1931, now available via Hugging Face and GitHub.
- By using a 1930 cutoff, creators sidestep copyright navigation; material released in 1930 entered the public domain on January 1, 2026, enabling the project. The team drew inspiration from AI researcher Owain Evans's concept of "vintage LLMs."
- Training on scanned physical sources introduces optical character recognition noise; the team determined OCR'ed pre-1931 texts achieved only 30 percent of human-transcribed performance. Although trained on 260 billion tokens, Talkie underperforms modern models on standard benchmarks.
- Talkie serves as an experimental platform for temporal generalization rather than a production baseline. The model exhibits "temporal leakage," retaining knowledge of post-1930 events like FDR's 1936 presidency despite its training cutoff.
- The team plans to scale Talkie significantly, potentially enabling multi-agent historical simulations and advancing AI understanding of how models navigate temporal concepts. This research may bridge STEM and humanities by creating open-source collaborative frameworks.
22 Articles
22 Articles
Vintage chatbot lives in the past like an elderly relative
Talkie's training data stops at the end of 1930, and its creators hope it'll help us better understand how AI thinks If you're tired of interacting with a bot that spews Nazi propaganda or refers to itself as MechaHitler, you could sign off of Elon Musk's xAI. Or, just to be sure, use an LLM whose training data ends in 1930, three years before the Nazis took power in Germany and nine years before World War II started.…
What if artificial intelligence allowed dialogue with the world... as it was before the 1930s? With Talkie, researchers recreate an AI "out of time" to test its limits, its biases and its vision of the future.
Talkie-1930 is an AI model of 13 billion parameters trained only with texts prior to 1931 (books, press, science), all in the public domain. It does not know the internet or subsequent facts, which avoids the contamination of benchmarks and allows to study how it changes a model according to its data. Developed by a non-profit team supported by Anthropic, it offers open versions. Its answers reflect the logic of its time: coherent political anal…
Coverage Details
Bias Distribution
- 50% of the sources lean Left, 50% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium











