DeepSeek Drops Open-Source Model that Compresses Text 10x Through Images, Defying Conventions
DeepSeek-OCR compresses text by up to 10 times while retaining 97% of information to help large language models process longer documents with lower computing costs.
- On Monday, DeepSeek released the open-source DeepSeek-OCR model on Hugging Face and GitHub, saying it compresses image-based text for LLMs using visual perception.
- DeepSeek built the model to address LLM long-context limits, as researchers said processing text as images can be more efficient for handling long-context documents with vision encoders.
- DeepSeek described the model's two-part architecture with a 380 million-parameter DeepEncoder and a DeepSeek3B-MoE-A570M decoder, trained on 30,000,000 PDF pages in roughly 100 languages.
- Practically, the system supports high-throughput data generation for LLMs, producing training data at a scale of over 200,000 pages per day on a single NVIDIA A100 GPU, the company said.
- The paper says vision-text compression delivers major token reductions, reporting seven- to 20-times reduction and a compression factor of ten with 97 per cent information retention, following DeepSeek's V3 and R1 open-weight models.
16 Articles
16 Articles
DeepSeek drops open-source model that compresses text 10x through images, defying conventions
DeepSeek, the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs, has released a new model that fundamentally reimagines how large language models process information—and the implications extend far beyond its modest branding as an optical character recognition tool.The company's DeepSeek-OCR model, released Monday with full open-source code and weights, achieves what researcher…
DeepSeek releases new OCR model capable of generating 200,000 pages daily on a single GPU · TechNode
DeepSeek has unveiled DeepSeek-OCR: Contexts Optical Compression, an open-source model developed by its DeepSeek-AI research team. The new system introduces a visual-based method to compress long text contexts, improving recognition efficiency while cutting computation costs. According to the team, DeepSeek-OCR surpasses several mainstream models in benchmark tests with far fewer visual tokens. It can also produce more than 200,000 pages of trai…
Chinese AI researchers want to keep chatbots fast and cheap with images in long contexts. Optical context compression is intended to improve AI assistants.
The Chinese start-up DeepSeek has just released an open-source multimodal AI model capable of processing complex documents by drastically reducing the cost of calculation. Using visual perception as a powerful compression tool, DeepSeek-OCR opens the way for the analysis of previously inaccessible data volumes.
Coverage Details
Bias Distribution
- 75% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium