AI's intelligence is being challenged with the game Super Mario

According to TechCrunch , many people think that Pokémon is the toughest test for artificial intelligence (AI)? But the AI challenge has not stopped there, recently, researchers at the University of California San Diego (USA) have just launched a new challenge with the game Super Mario Bros. The results show that not all AIs can successfully 'reach the finish line'.

Trí khôn của AI đang được thử thách bằng game Super Mario - Ảnh 1. — Mario games are being used to test the performance of large AI models

Super Mario poses a huge challenge for AI models

Hao AI Labs took an AI into the world of Mario to test the capabilities of today's leading language models. The results showed that Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Meanwhile, Google's Gemini 1.5 Pro and OpenAI's GPT-4o had more difficulty playing the game on their own.

It's worth noting that this isn't the original 1985 Super Mario Bros. The game runs on an emulator, integrated with the GamingAgent framework to let the AI control the little Mario. The GamingAgent provides basic instructions to the AI and screenshots of the game. The AI then generates Python code to control the character.

According to Hao AI, the game forces models to 'learn' how to plan complex moves and build strategies for playing. Interestingly, 'reasoning' models like OpenAI's o1, which are stronger on most tests, struggle more than 'non-reasoning' models.

The reason given is that reasoning models take time to make decisions, while Super Mario Bros. requires quick reflexes. A second of delay can lead to failure.

Using games to evaluate AI has been around for a long time, but many experts are skeptical about the accuracy of this method. They argue that games are too simple and provide too much data to train AI, not reflecting the true capabilities of AI in the real world.

Andrej Karpathy, a research scientist at OpenAI, calls this the ‘assessment crisis.’ He admits that there is currently no accurate metric for assessing AI capabilities.

While debates about the accuracy of evaluating AI through games remain, seeing AI 'fight' in Mario's world is still an interesting experience and helps people better understand the capabilities of AI.