In January, Chinese startup DeepSeek released its open-source R1 inference model. The company said the large language model behind R1 was developed using less powerful chips and at a much lower cost than Western AI models.

Investors reacted to the news by selling off shares of Nvidia and other tech companies, wiping out $600 billion in market value for Nvidia in a single day. The world’s largest semiconductor company has since recovered most of that loss.

deepseek bloomberg
DeepSeek's large language models were developed using much weaker and cheaper chips than Western models. Photo: Bloomberg

In his latest video , Jensen Huang argues that the market's extreme reaction stems from investors misinterpreting DeepSeek's progress.

They question whether the trillions of dollars Big Tech is spending on AI infrastructure are necessary if less computing power is needed to train the models.

However, Mr. Huang said the industry still needs computing power for post-training methods, which allow AI models to draw conclusions or predictions after being trained.

As post-training methods become more diverse and advanced, so does the demand for the computing power that Nvidia chips provide.

According to Nvidia CEO, investors think the world is all about pre-training and inference (asking AI a question and getting an answer right away), but post-training is the most important part of AI. That's where it learns how to solve specialized problems.

Still, Huang doesn’t deny that DeepSeek has “injected” more energy into the AI ​​world. AMD CEO Lisa Su also commented that DeepSeek is promoting innovations that are “good for AI applications” in an interview earlier this month.

The term pre-training refers to the initial stage of training a large language model (LLM), where the model learns from a large, diverse dataset, typically up to several trillion tokens.

The goal here is to help the model grasp the general language, context, and types of knowledge in common. This stage often requires huge computing power and data, costing hundreds of millions of dollars.

The term post-training or fine-tuning refers to taking a previously trained model and then training it on a more specific dataset. This dataset is usually smaller and focused on a particular domain or task.

Its purpose is to tune the model to perform better in specific scenarios and tasks, which were not covered in depth during pre-training. New knowledge added during post-training will help improve the model's performance rather than expanding general knowledge.

(According to Insider, Reddit)