![]() |
DeepSeek has released a new AI model capable of processing documents with 7-20 times fewer tokens than traditional methods. Photo: The Verge . |
According to SCMP , DeepSeek has released a new multimodal artificial intelligence (AI) model capable of processing large and complex documents with significantly fewer tokens—7-20 times fewer—than traditional text processing methods.
Tokens are the smallest units of text that AI processes. Reducing the number of tokens means saving computational costs and increasing the efficiency of an AI model.
To achieve this, the DeepSeek-OCR (optical character recognition) model used visual perception as a means of compressing information. This approach allows large language models to process massive volumes of text without incurring proportionally increasing computational costs.
“Through DeepSeek-OCR, we have demonstrated that using visual perception to compress information can achieve significant token reductions—from 7-20 times for different historical contextual stages—offering a promising direction,” DeepSeek stated.
According to the company's blog post, DeepSeek-OCR consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M, which acts as the decoder.
In this model, DeepEncoder acts as the core tool, helping to maintain low activation levels under high-resolution input while achieving strong compression ratios to reduce the number of tokens.
Subsequently, the decoder is a Mixture-of-Experts (MoE) model with 570 million parameters, tasked with reconstructing the original text. The MoE architecture divides the model into subnetworks that specialize in processing a subset of the input data, optimizing performance without activating the entire model.
On OmniDocBench, a benchmark for document readability, DeepSeek-OCR outperforms major OCR models like GOT-OCR 2.0 and MinerU 2.0, while using significantly fewer tokens.
Source: https://znews.vn/deepseek-lai-co-dot-pha-post1595902.html







Comment (0)