Zalo's female engineer brings Vietnamese technology to the world's leading AI conference.

Six years with Zalo has allowed Bui Thi Cuc to further develop her passion for artificial intelligence. From a data science job right after graduation, Cuc has become a senior AI engineer at Zalo, representing the VMLU development team and presenting research at the ACL (Association for Computational Linguistics) conference in Vienna, Austria in the summer of 2025.

This is considered the leading academic conference on natural language processing, attracting over 2,000 researchers each year. Many foundational works on NLP have been presented here before becoming industry standards.

“From the very first day at the conference, I was overwhelmed by the scale and the open academic exchange,” Cúc recalled. The research atmosphere was constant from morning to night, with numerous posters on display, lengthy technical discussions, and the presence of labs from Meta, Google, Apple, and more.

From Vietnam to Vienna, Austria

Bui Thi Cuc's research paper, titled "ACL VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMs," aims to address the lack of evaluation tools for large-scale Vietnamese language models.

Launched in November 2023 by Zalo AI and the Japan Advanced Institute of Science and Technology (JAIST), VMLU has provided a common set of standards to help large-scale language model (LLM) developers targeting Vietnamese users to evaluate and develop appropriate training strategies for their models.

Bui Cuc stated that during the development of VMLU, the members faced numerous challenges, from building benchmarks to ensuring data quality. However, the most stressful phase was the research submission process. The acceptance rate for research at ACL is only about 25%, and they had to compete with many large AI research institutions worldwide .

“When we received the results at the Borderline Conference level – meaning the findings were accepted – the whole team was happier than expected. After that, I gathered all the feedback from the review panel, discussed it with my direct manager, and finally convinced the reviewers to raise the score so the research paper would be accepted at the main conference,” Cúc recalled.

Ms. Bui Thi Cuc, representing the VMLU development team, presented the research project at the ACL conference.

This is the first benchmark designed to assess the Vietnamese language comprehension ability of large language models. The benchmark includes four datasets with 17,000 questions to evaluate abilities in: general knowledge, reading comprehension, reasoning, and dialogue.

According to Zalo engineers, most current benchmarks are designed for English, which does not fully reflect the syntactic, semantic, and cultural context of Vietnamese. Directly translating English question sets into Vietnamese often results in inaccuracies or loss of semantic nuances.

To explain the LLM assessment framework in simple terms, imagine AI as a student needing an exam to test their abilities. Currently, most exams are in English, but the Zalo AI team wanted to create an exam in Vietnamese to test whether the AI truly understands and uses Vietnamese well.

Mr. Nguyen Truong Son, Director of Science at Zalo AI, affirmed: “The VMLU evaluation system provides a common ‘measure’ for evaluating large-scale Vietnamese language models. After its publication, we received a lot of positive feedback from the AI research community both domestically and internationally. In the future, I expect VMLU to become a widely used and widely applied evaluation standard, not only in the academic community but also in businesses developing AI products.”

Applying AI to Zalo's products.

Beyond its academic value, VMLU has many potential applications in the development of AI products at Zalo.

According to Cúc, firstly, the benchmark helps evaluate the accuracy and language comprehension of the models used in the Kiki Info product, a digital citizen assistant. This allows the team to identify the model's limitations in specific skills.

Secondly, VMLU is used as a testing tool before deploying new AI features, such as message summarization, automatic reply suggestions, or customer service support.

Ultimately, with its conversational evaluation capabilities, VMLU helps Zalo develop enterprise chatbots with natural-sounding communication that aligns with Vietnamese communication culture.

"Our biggest goal is to create AI models that understand Vietnamese naturally and accurately," Cúc said.

The young female engineer hopes VMLU will continue to expand.

Returning from Vienna, the young engineer hopes that VMLU will continue to expand and become a platform for many domestic research groups to compare and evaluate models in a unified manner.

“I hope this dataset will be the starting point for the Vietnamese AI community to develop more strongly in the coming years. We want to contribute a small part to making Vietnamese a language that global AI models understand correctly and process effectively.”