New research reveals the secret to DeepSeek's success

Chinese startup DeepSeek’s R1 artificial intelligence model – which shocked the US stock market when it launched in January – has been published in the first peer-reviewed study, showing how it developed a powerful LLM for just around $300,000.

The R1 is designed to excel at reasoning tasks like math and programming, making it a low-cost rival to tools developed by US tech giants.

This is an “open weight” model, which is free to download and is currently the most popular model on the Hugging Face platform, with over 10.9 million downloads.

The Nature study, an update of a January manuscript, first revealed that training R1 cost just $294,000, in addition to about $6 million spent on building the base model.

This figure is much lower than the tens of millions of dollars that competitors are said to have spent.

DeepSeek said R1 was trained primarily using Nvidia H800 chips, which the US has banned from exporting to China since 2023.

R1’s breakthrough was its use of “pure reinforcement learning,” where the model is trained on trial and error and rewarded for correct answers, rather than learning from examples chosen by humans. It also scores its own efforts using internal estimates, a technique called “relative group policy optimization,” which helps boost performance.

“The rigorous peer review process helps validate the model’s value and reliability,” says researcher Huan Sun (Ohio State University). “Other firms should do the same.”

Lewis Tunstall, a machine learning engineer at Hugging Face, said this is an important precedent because transparency in AI development helps to assess risks more accurately.

DeepSeek claims R1 was not trained using data from OpenAI's models, though it admits the underlying model was trained on web data — which could include AI-generated content.

Experts say that while it is difficult to verify absolutely, current evidence suggests that pure enhancement is sufficient to achieve high performance.

On the ScienceAgentBench test, R1 did not top the accuracy chart, but it struck a good balance between efficiency and cost. The researchers are now looking to apply DeepSeek's method to enhance the reasoning capabilities of existing LLMs, as well as extend it to areas beyond math and programming.

According to Mr. Tunstall, R1 has “started a revolution” in artificial intelligence development./.

(TTXVN/Vietnam+)

Source: https://www.vietnamplus.vn/nghien-cuu-moi-tiet-lo-bi-quyet-thanh-cong-cua-deepseek-post1062474.vnp