Using improved techniques from DeepSeek's AI training, the Huawei Ascend chip has delivered outstanding performance. Photo: Reuters . |
Researchers working on Huawei's Pangu large language model (LLM) announced on June 4 that they had improved DeepSeek's original approach to training artificial intelligence (AI) by leveraging the company's proprietary hardware, SCMP reported.
Specifically, the paper published by Huawei's Pangu team, which includes 22 core collaborators and 56 additional researchers, introduced the concept of Mixture of Grouped Experts (MoGE), an upgraded version of the Mixture of Experts (MoE) technique that played a key role in DeepSeek's cost-effective AI models.
According to the paper, while MoE offers low execution costs for large model parameters and advanced learning capabilities, it also often leads to inefficiencies. This comes from uneven activation, which hinders performance when running on multiple devices in parallel.
Meanwhile, MoGE is improved by a team of experts in the selection process and better balances the workload of the "experts," according to the researchers.
In AI training, the term “expert” refers to specialized sub-models or components within a larger model. Each of these models will be designed to handle specific tasks or distinct types of data. This allows the overall system to leverage diverse expertise to improve performance.
According to Huawei, the training process consists of three main phases: pre-training, long-context expansion, and post-training. The entire process included pre-training on 13.2 trillion tokens and long-context expansion using 8,192 Ascend chips - Huawei's most powerful AI processor, used to train AI models and aimed at challenging Nvidia's dominance in high-end chip design.
By testing the new architecture on an Ascend neural processing unit (NPU) specifically designed to accelerate AI tasks, the researchers found that MoGE “results in better expert load balancing and more efficient performance for both model training and inference.”
As a result, compared with models such as DeepSeek-V3, Alibaba's Qwen2.5-72B, and Meta Platforms' Llama-405B, Pangu outperforms most general English benchmarks and all Chinese benchmarks, demonstrating superior performance in long-context training.
Source: https://znews.vn/huawei-tuyen-bo-huan-luyen-ai-tot-hon-deepseek-post1558359.html
Comment (0)