Alibaba's R1-Omni model can infer a person's emotional state in a video . Photo: Xpert.Digital . |
According to Bloomberg , Alibaba's Tongyi Lab released the R1-Omni model as open source on March 11.
The most notable feature of this model is that it can infer the emotional state of a person in a video, while also describing clothing and surroundings.
This is a step forward in the field of computer vision and an upgraded version of the previous open-source HumanOmni model, developed by the same Alibaba principal researcher Jiaxing Zhao. More specifically, Alibaba publicly released R1-Omni for free download on the Hugging Face platform.
The research report shows that R1-Omni can better understand how visual and auditory information supports emotion recognition. Additionally, to improve emotion recognition in both visual and audio modalities, the model improves its AI systems through reinforcement learning algorithms.
Reinforcement learning is a type of machine learning process that focuses on decision making by automated agents, including advanced AI software, robots, and self-driving cars.
These automated agents learn to perform a task through trial and error in the absence of human guidance. This technique is an important element in AI model development because it solves sequential decision-making problems in an uncertain environment.
The results of the study show that the R1-Omni model has stronger reasoning, insight, and generalization capabilities than other models.
Source: https://znews.vn/ai-trung-quoc-doc-duoc-cam-xuc-con-nguoi-post1537948.html
Comment (0)