DeepSeek reveals how they built a cheap AI model. Photo: Bloomberg . |
In a research report published on May 15, DeepSeek shared details for the first time about how it built one of the world's most powerful open-source AI systems at a fraction of the cost of its competitors.
The study, titled “Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures,” was co-authored by founder Liang Wenfeng. DeepSeek attributes its success to designing hardware and software in parallel, a move that is different from many companies that focus on optimizing software in isolation.
“DeepSeek-V3, trained on 2,048 Nvidia H800 GPUs, demonstrates how parallel designs can effectively address these challenges, enabling efficient training and inference at scale,” the team wrote in the paper. DeepSeek and hedge fund High-Flyer stocked up on the H800 chip line before the US banned their export to China in 2023.
The DeepSeek team, aware of the hardware limitations and “exorbitant costs” of training large language models (LLMs), the underlying technology behind chatbots like OpenAI’s ChatGPT, has implemented a series of technical optimizations that increase memory efficiency, improve inter-chip communication, and improve the efficiency of the entire AI infrastructure, according to the paper.
In addition, DeepSeek emphasizes the role of the Model of Expert (MoE) architecture. This is a machine learning method that divides an AI model into sub-networks, each of which processes a separate part of the input data and works collaboratively to optimize the results.
MoE reduces training costs and speeds up inference. This method has now been widely adopted in China's tech industry, including Alibaba's latest Qwen3 model.
DeepSeek made headlines when it released its basic V3 model in December 2024 and its R1 reasoning model in January. These products caused a stir in global markets, contributing to a sharp decline in AI-related technology stocks.
Although DeepSeek has not revealed any further plans in recent times, it has maintained interest in the community by publishing regular reports. In late March, the company released a minor update to DeepSeek-V3, and in late April, it quietly released its Prover-V2 system for processing mathematical proofs.
Source: https://znews.vn/deepseek-tiet-lo-bi-mat-post1554222.html
Comment (0)