In late January, DeepSeek took the global tech world by storm with the release of two LLM models that were “on par” with American products but cost a fraction of the price. Among them, the open-source reasoning model DeepSeek-R1 can solve some of the same scientific problems as o1, OpenAI’s most advanced LLM.

While the world was surprised, domestic researchers said the achievement was completely predictable and in line with Beijing's ambition to become a leading power in artificial intelligence (AI).

Yunji Chen, a computer scientist at the Institute of Computer Science of the Chinese Academy of Sciences, points out that sooner or later a company like DeepSeek will appear in China.

This is due to the huge amount of investment capital pouring into LLM development companies and the number of people with PhDs in STEM (science, technology, engineering or mathematics) subjects.

“If there were no DeepSeek, there would be other Chinese LLMs,” Chen said.

This is a proven fact. A few days after the DeepSeek “earthquake”, Alibaba released its most advanced LLM to date, Qwen2.5-Max, which it claims outperforms DeepSeek-V3.

Moonshot AI and ByteDance also announced new inference models, Kimi 1.5 and 1.5-pro, which can outperform o1 in some benchmark tests.

Government priorities

In 2017, the Chinese government announced its intention to become a world leader in AI by 2030. China aims to complete major breakthroughs in AI “so that technology and applications reach world-leading levels” by 2025.

To do that, developing an AI talent pipeline is a top priority. By 2022, China’s Ministry of Education has authorized 440 universities to offer AI majors, according to a report from Georgetown University’s Center for Security and Emerging Technology (CSET).

That same year, China accounted for half of the top AI researchers, while the US contributed just 18%, according to the consultancy MacroPolo.

deepseek bloomberg
DeepSeek surprises with a series of large, low-cost, high-performance language models. Photo: Bloomberg

Marina Zhang, a policy science researcher at the University of Technology Sydney, said DeepSeek likely benefited from government investment in AI training and talent development, including numerous scholarships, research grants and partnerships between academia and industry.

For example, state-backed initiatives like the National Engineering Laboratory for Deep Learning Technologies and Applications have trained thousands of AI experts.

It's hard to find exact figures on DeepSeek's workforce, but founder Liang Wenfeng shares that the company recruits graduates and PhD students from the country's largest universities.

Some members of the leadership team are under 35 and have grown up with China's rise as a tech superpower, Zhang said. "They are deeply motivated by self-reliance in innovation."

Wenfeng, 39, graduated with a degree in computer science from Zhejiang University. He co-founded hedge fund High-Flyer nearly a decade ago and founded DeepSeek in 2023.

National policies that foster a model ecosystem for AI will help companies like DeepSeek attract both funding and people, according to Jacob Feldgoise, who studies AI talent in China at CSET.

But despite the rise in AI courses at universities, Feldgoise is unclear how many students are graduating with AI degrees and whether they are being taught the skills companies need.

In recent years, Chinese AI companies have complained that graduates from these programs are not meeting their expectations, prompting some to partner with universities to improve quality.

"Tempering"

Perhaps the most impressive element of DeepSeek's success, the scientists say, is that they developed DeepSeek-R1 and Janus-Pro-7B in the context of US government export controls that have blocked access to advanced AI computing chips since 2022.

According to Zhang, DeepSeek represents a distinctly Chinese approach to innovation, emphasizing efficiency in the face of a host of constraints.

Wenfeng's startup says it used about 2,000 Nvidia H800 chips to train DeepSeek-V3. By contrast, Llama 3.1 405B, a sophisticated LLM released by Meta in July 2024, relied on more than 16,000 Nvidia H100 chips.

9x talent gets special attention from Chinese Prime Minister Li Qiang 9x talent gets special attention from Chinese Prime Minister Li Qiang

In a 2022 WeChat post, High-Flyer said it had 10,000 of Nvidia’s older A100 chips. “The problem we face has never been money, but the ban on high-end chips,” Wenfeng told Chinese media in July 2024.

DeepSeek uses a variety of methods to increase the efficiency of its models. For example, it implements the Mixture of Experts (MoE) architecture, a machine learning method that trains models faster with fewer parameters than traditional techniques.

It helps DeepSeek train models with fewer chips, according to University of Sydney computer scientist Chang Xu.

Another technique is multi-head latent attention (MLA), which allows the model to store more data with less memory.

DeepSeek’s achievements could be a “guidepost” for countries with AI ambitions but lacking the financial and hardware resources to train massive LLMs, said Yanbo Wang, a policy science researcher at the University of Hong Kong.

(According to Nature, Fortune)

Source: https://vietnamnet.vn/bi-mat-dang-sau-deepseek-trung-quoc-khien-ca-the-gioi-chao-dao-voi-cu-soc-ai-2391114.html