Vietnam.vn - Nền tảng quảng bá Việt Nam

A remarkable meeting where mathematicians seek to beat artificial intelligence.

The world's leading mathematicians secretly met to find a way to beat artificial intelligence (AI), but were astonished by AI's capabilities.

VietnamPlusVietnamPlus20/05/2025

One weekend in mid-May, a closed-door meeting of mathematicians took place. Thirty of the world's leading mathematicians secretly traveled to Berkeley, California, USA, to participate in a confrontation with a chatbot capable of "reasoning." This chatbot was tasked with solving problems devised by the mathematicians themselves, in order to test its problem-solving abilities.

After two consecutive days of bombarding them with professor-level questions, mathematicians were astonished to discover that this chatbot could solve some of the most difficult problems ever solved in history.

"I've seen colleagues say outright that this large-scale language model is approaching the level of mathematical genius," Ken Ono, a professor at the University of Virginia and the chair and judge of the meeting, told Scientific American.

The aforementioned chatbot is based on o4-mini , a large language model (LLM) designed for complex reasoning. This product of OpenAI is trained to perform sophisticated reasoning steps. A similar model from Google, called Gemini 2.5 Flash, also possesses similar capabilities.

Like previous ChatGPT LLMs, o4-mini learns to predict the next word in a text string. However, the difference lies in the fact that o4-mini is a lighter, more flexible version, trained on deep data and receiving close human tuning—allowing it to delve into mathematical problems that previous models couldn't reach.

To challenge and assess the capabilities of o4-mini, OpenAI commissioned Epoch AI—a non-profit organization specializing in testing LLM models—to create 300 previously unpublished mathematical questions. While traditional LLMs can solve many complex problems, when challenged with entirely new questions, most of them only solved less than 2% correctly. This demonstrates that they lack true reasoning ability.

In its latest evaluation project, Epoch AI has recruited young mathematics PhD Elliot Glazer as its lead. The new project, called FrontierMath , will be launched in September 2024.

The project collected new questions across four difficulty levels, ranging from undergraduate and postgraduate to in-depth research. In April 2025, Glazer found that o4-mini could solve about 20% of the problems. Therefore, he immediately moved it to level 4 – requiring it to solve problems that even highly advanced mathematicians would struggle with.

Participants were required to sign a confidentiality agreement, only communicating through the encrypted Signal app, as the use of email could be scanned and its content "learned" by the LLM, thereby falsifying evaluation data.

For every problem that o4-mini cannot solve, the problem setter will receive a $7,500 prize.

The initial working group was slow but steady in coming up with questions. However, Glazer decided to speed things up by organizing a two-day in-person meeting on May 17–18. Thirty mathematicians attended, divided into groups of six, competing against each other—not to solve problems, but to devise problems that AI couldn't solve.

By the evening of May 17th, Ken Ono began to feel frustrated with the chatbot, which displayed a level of mathematical ability far exceeding expectations, making it difficult for the team to "trap" it. "I came up with a problem that industry experts would recognize as an open problem in number theory – a problem suitable for a PhD," he recounted.

As a result, when he asked o4-mini, he was stunned to see the chatbot analyze, reason, and provide the correct solution in just 10 minutes. Specifically, in the first two minutes, it researched and grasped all the relevant material. Then, it suggested experimenting with a simpler version of the problem to learn the approach.

Five minutes later, the chatbot provided the correct answer, accompanied by a confident—even somewhat arrogant—tone. “It started acting sly,” Ono recounted, “And it even added: ‘No need to quote, I’ve already calculated the mystery number!’”

Having failed against the AI, on the morning of May 18th, Ono immediately sent an alert message to the team via Signal. “I was completely unprepared to deal with a model like this,” he said. “I had never seen this kind of reasoning in a computer model. It thought like a real scientist thinks. And that was terrifying.”

Although the mathematicians eventually succeeded in finding 10 questions that baffled o4-mini, they couldn't hide their astonishment at the speed of AI development in just one year.

Ono compared the experience of working with o4-mini to collaborating with an extremely talented colleague. Yang Hui He, a mathematician at the Institute for Mathematical Sciences in London and a pioneer in applying AI to mathematics, commented: “This is what a very, very good graduate student can do — even more than that.”

And it's worth noting that AI works much faster than humans. While it takes humans weeks or months to solve, o4-mini only takes a few minutes.

The excitement surrounding the battle of wits with o4-mini was accompanied by considerable concern. Both Ono and He warned that o4-mini's capabilities could lead to overconfidence. “We have proof by induction, proof by contradiction, and now proof by… overwhelming force,” He said. “If you state something with enough confidence, others will feel intimidated. I think o4-mini has mastered this type of proof: whatever it says is very certain.”

As the meeting concluded, the mathematicians began to ponder the future of mathematics. They discussed the possibility of a “fifth level”—questions that even the world’s best mathematicians cannot solve. If AI reaches that level, the role of the mathematician will change dramatically: they might then become questioners, interacting with and guiding AI in its reasoning to discover new mathematical truths—similar to how a professor works with graduate students.

“I’ve been telling my colleagues for a while now that it would be a grave mistake to assume that general artificial intelligence will never appear, that it’s just a computer,” Ono said. “I don’t want to panic, but in some respects, these large language models have already begun to outperform most of the world’s best graduate students.”

(Vietnam+)

Source: https://www.vietnamplus.vn/cuoc-gap-go-dac-biet-noi-cac-nha-toan-hoc-tim-cach-danh-bai-tri-tue-nhan-tao-post1043183.vnp


Comment (0)

Please leave a comment to share your feelings!

Same category

Admire the dazzling churches, a 'super hot' check-in spot this Christmas season.
The 150-year-old 'Pink Cathedral' shines brightly this Christmas season.
At this Hanoi pho restaurant, they make their own pho noodles for 200,000 VND, and customers must order in advance.
The Christmas atmosphere is vibrant on the streets of Hanoi.

Same author

Heritage

Figure

Enterprise

The 8-meter-tall Christmas star illuminating Notre Dame Cathedral in Ho Chi Minh City is particularly striking.

News

Political System

Destination

Product