AI models need a set of standards that deeply assess complex capabilities

The VMLU (Learning, Assessment and Ranking Platform for Vietnamese Language LLMs) 2024 Development Status Report (LLM) has shown a sharp increase in the number of LLMs focusing on Vietnamese. Specifically, the VMLU platform has published 45 LLMs on the rankings, received evaluation requests from more than 155 organizations and individuals, and summarized 691 downloads of the evaluation criteria and 3,729 LLM evaluations from the platform in 2024.

Many domestic and foreign organizations have been using VMLU such as VinBigData, VNPT AI, Viettel Solutions, University of Technology - VNU-HCM, UONLP x Ontocord - University of Oregon (USA), DAMO Academy - Alibaba Group, SDSRV teams - Samsung...

VMLU English 1

VMLU will launch its first set of LLM assessment criteria in 2023.

Along with the proliferation in quantity, the quality of LLM models is also increasingly improved. If in the past, LLMs were trained around basic knowledge, now developers focus on expanding more skills such as reading comprehension, conversation exchange or human-like reasoning.

Responding to the increasingly strong development of advanced Vietnamese LLM models, VMLU has published new sets of standards to further assess the complex capabilities of the models.

Standards that promote LLM excellence

Previously, when the market lacked quality standards, many domestic research groups had to build their own internal assessment tools with their own standards. This limited the evaluation as well as the comparison of model quality with existing LLMs on the market to have appropriate training strategies.

To solve this problem, in November 2023, VMLU - the first set of common "Make in Vietnam" standards was researched by a team of leading Vietnamese experts and provided free of charge to the community.

The standard set of 10,880 multiple-choice questions, covering 58 topics, divided into many levels, has helped developers easily access general assessment data sets. At the same time, take advantage of VMLU's rankings to directly compare their models with existing LLMs on the market.

Dr. Dang Tran Thai, Head of Natural Language Processing Department - VinBigData Virtual Assistant Technology Block, whose ViGPT-1.6B-v1 model is in the ranking of from-scratch models (LLM trained from scratch) of VMLU, said: "VMLU has relatively complete and comprehensive data to evaluate the knowledge capacity of LLM for Vietnamese. VMLU is not only useful for evaluating the quality of LLM at each development stage, but also a measure of the effectiveness of our experiments during the training process."

“This will be a 'springboard' to promote the development of AI in general and LLM in particular, because we must have good standards so that we have a basis to train high-quality models,” added Dr. Dang Tran Thai.

Principal Engineer at Microsoft - Dr. Bach Hung Nguyen also affirmed the usefulness of VMLU in evaluating the performance of LLM models in Vietnamese, helping development units better understand the capabilities of the model. In addition, Dr. Bach Hung Nguyen also expects VMLU to add a set of useful skills such as reasoning, code generation, and text summarization.

New version of VMLU aims to perfect higher-order LLM models

Recently, VMLU continues to announce a new set of standards, assessing the reasoning and interaction abilities of LLM. The expanded set of standards assesses 3 core skills of a modern LLM, including:

Reading Comprehension (ViSQuAD) : 3,310 questions assess the ability to understand text in depth and handle complex questions based on the specific characteristics of Vietnamese language and context.

Reasoning (ViDrop) : 3,090 questions challenge LLM's logical reasoning abilities through tasks such as comparison, counting, and arithmetic calculations.

Interaction (ViDialog) : 210 dialogues assess coherence, ability to understand context and apply multi-disciplinary knowledge (history, geography, logic) in dialogue.

This upgrade not only helps developers evaluate models more comprehensively, but also promotes LLM to create useful values for end users.

VMLU brother 2

New VMLU standards to be released in 2025.

Dr. Chau Thanh Duc, Director of Artificial Intelligence Research & Development at Zalo AI - the organization that developed VMLU, said: “There are currently hundreds of different standards in the world to evaluate the capacity of large language models. However, the number of assessment standards specifically for Vietnamese is very limited. With the launch of the standards in 2023 and 2025, we hope to diversify the assessment aspects.”

The new set of standards has been launched on the VMLU website https://vmlu.ai/ for individuals and research groups to evaluate their models.

VMLU brother 3

The new set of standards has been updated on the VMLU website.

VMLU is a platform for evaluating and ranking Vietnamese LLM models built by Zalo AI in collaboration with the Japan Advanced Institute of Science and Technology (JAIST) and provided free of charge to the community from November 2023. With the effort to accompany the Vietnamese AI community, VMLU is contributing to promoting the Vietnamese people's ability to master new technologies. Thereby, contributing to the country's technological development era with a breakthrough orientation in Science, technology, innovation and national digital transformation.

Source: https://znews.vn/mo-hinh-ai-dang-can-bo-tieu-chuan-danh-gia-sau-cac-nang-luc-phuc-tap-post1589901.html