Vietnam.vn - Nền tảng quảng bá Việt Nam

AI's fatal weakness

Research indicates that despite bold claims about AI's programming capabilities, error handling remains an area where humans excel.

ZNewsZNews12/04/2025

AI is not yet able to replace humans in the field of programming. Photo: John McGuire .

Recently, leading AI models from OpenAI and Anthropic are increasingly being used for programming applications. ChatGPT and Claude have increased memory and processing power to analyze hundreds of lines of code, while Gemini integrates a dedicated Canvas results display feature for programmers.

In October 2024, Sundar Pichai, CEO of Google, stated that 25% of new code at the company was generated by AI. Mark Zuckerberg, CEO of Meta, also expressed ambitions to widely deploy AI coding models within the corporation.

However, a new study from Microsoft Research, Microsoft's R&D division, shows that AI models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, are unable to handle many errors in a programming benchmark called SWE-bench Lite.

The study's authors examined nine different AI models that incorporated a range of debugging tools such as a Python debugger and were capable of handling problems in a single statement. The models were tasked with solving 300 software bugs selected from the SWE-bench Lite dataset.

AI lap trinh anh 1

Success rate when solving programming problems from the SWE-bench Lite dataset. Image: Microsoft.

Even when equipped with more powerful and newer models, the results showed that the AI ​​agent rarely successfully completed more than half of the assigned debugging tasks. Among the models tested, Claude 3.7 Sonnet achieved the highest average success rate at 48.4%, followed by OpenAI's o1 at 30.2%, and o3-mini at 22.1%.

Some reasons for the low performance mentioned above include some models not understanding how to apply the provided debugging tools. Additionally, according to the authors, a bigger problem lies in the lack of sufficient data.

They argue that the training system for the models still lacks data simulating the debugging steps that humans take from start to finish. In other words, the AI ​​hasn't learned enough about how humans think and act step-by-step when dealing with a real-world software bug.

Training and refining the models will help them become more proficient in debugging software. "However, this will require specialized datasets for the training process," the authors stated.

Numerous studies have pointed out security vulnerabilities and errors in AI during code generation, due to weaknesses such as limited understanding of programming logic. A recent review of Devin, an AI-powered programming tool, showed that it only completed 3 out of 20 programming tests.

The programming capabilities of AI remain a subject of much debate. Previously, Kevin Weil, Product Director of OpenAI, suggested that by the end of this year, AI would surpass human programmers.

On the other hand, Bill Gates, co-founder of Microsoft, believes that programming will remain a sustainable career in the future. Other leaders such as Amjad Masad (CEO of Replit), Todd McKinnon (CEO of Okta), and Arvind Krishna (CEO of IBM) have also voiced their support for this view.

Microsoft's research, while not new, serves as a reminder to programmers, including managers, to think more carefully before handing over complete coding authority to AI.

Source: https://znews.vn/diem-yeu-chi-mang-cua-ai-post1545220.html


Comment (0)

Please leave a comment to share your feelings!

Heritage

Figure

Enterprise

Pomelos from Dien, worth over 100 million VND, have just arrived in Ho Chi Minh City and have already been ordered by customers.

News

Political System

Destination

Product