Published 2 days ago • loading... • Updated 1 day ago

"Vintage LLMs" Could Open a New Field of Historical Research

On Monday, researchers released Talkie-1930, a 13B parameter language model trained exclusively on English-language text published before 1931, now available via Hugging Face and GitHub.
By using a 1930 cutoff, creators sidestep copyright navigation; material released in 1930 entered the public domain on January 1, 2026, enabling the project. The team drew inspiration from AI researcher Owain Evans's concept of "vintage LLMs."
Training on scanned physical sources introduces optical character recognition noise; the team determined OCR'ed pre-1931 texts achieved only 30 percent of human-transcribed performance. Although trained on 260 billion tokens, Talkie underperforms modern models on standard benchmarks.
Talkie serves as an experimental platform for temporal generalization rather than a production baseline. The model exhibits "temporal leakage," retaining knowledge of post-1930 events like FDR's 1936 presidency despite its training cutoff.
The team plans to scale Talkie significantly, potentially enabling multi-agent historical simulations and advancing AI understanding of how models navigate temporal concepts. This research may bridge STEM and humanities by creating open-source collaborative frameworks.

Insights by Ground AI

Podcasts & Opinions

Podcast Mention

Our Big Dumb Mouth

Weekly News and Conspiracy podcast

OBDM1385 - Never Talk About Goblins | Stolen Drones | The Buga Sphere | Strange News

Our Big Dumb Mouth discuss Talkie, a 13B ‘vintage’ LLM trained on pre‑1931 texts and its reasoning capabilities

2 days ago

Listen to Full Episode Full Episode Unlock Timestamp

Get Vantage — Podcasts, Ratings, Timestamps

Podcasts & Opinions

22 Articles

Gizmodo

Lean Left

Talkie Is a 'Vintage LLM' Trained on Pre-1930 Data to Help Facilitate 'Time Travel'

Will it predict World War II? That remains to be seen.

3 days ago·United States

Read Full Article

substack.com

Center

Are "Vintage LLMs" the start of a new humanistic field?

Thoughts on Historical Language Models and Talkie-1930

2 days ago

Read Full Article

Boing Boing

Left

Talkie is an AI language model trained only on pre-1931 texts

There are various LLMs that focus on public domain texts, for ethical or experimental reasons, but most also incorporate modern material (such as Wikipedia) so they have a fully contemporary…

2 days ago·United States

Read Full Article

The Register

Center

Vintage chatbot lives in the past like an elderly relative

Talkie's training data stops at the end of 1930, and its creators hope it'll help us better understand how AI thinks If you're tired of interacting with a bot that spews Nazi propaganda or refers to itself as MechaHitler, you could sign off of Elon Musk's xAI. Or, just to be sure, use an LLM whose training data ends in 1930, three years before the Nazis took power in Germany and nine years before World War II started.…

3 days ago

Read Full Article

Futura

Researchers Created an AI Blocked in 1930 to See How it...

What if artificial intelligence allowed dialogue with the world... as it was before the 1930s? With Talkie, researchers recreate an AI "out of time" to test its limits, its biases and its vision of the future.

1 day ago

Read Full Article

meneame.net

This AI Was Trained only with Texts Prior to 1930. We Asked Him About Hitler, Actions and the Future (ING)

Talkie-1930 is an AI model of 13 billion parameters trained only with texts prior to 1931 (books, press, science), all in the public domain. It does not know the internet or subsequent facts, which avoids the contamination of benchmarks and allows to study how it changes a model according to its data. Developed by a non-profit team supported by Anthropic, it offers open versions. Its answers reflect the logic of its time: coherent political anal…

2 days ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year