The long journey of AI in software engineering

Image for post 65.jpg — AI automates everything, and could it replace programmers? Photo: Midjourney

A group of researchers has just published a comprehensive map of the challenges that artificial intelligence (AI) faces in the field of software development, and proposed a research roadmap to further advance the field.

Imagine a future where AI silently takes over the tedious tasks of software development: refactoring complex code, migrating legacy systems, and tracking race-based bugs, so that human software engineers can focus entirely on system architecture, design, and creative problems that machines cannot yet solve. Recent advances in AI seem to have brought that vision very close.

However, a new study by scientists at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and partner research institutes has shown that, to realize that future, we must first confront the very real challenges of the present moment.

“Many people are saying that programmers are no longer necessary because AI has automated everything,” shared Professor Armando Solar-Lezama, a lecturer in electrical engineering and computer science at MIT, a senior researcher at CSAIL, and the lead author of the study. “In reality, we’ve made very significant progress. The tools we have now are far more powerful than before. But to reach the full potential of automation, there’s still a long way to go.”

Professor Armando Solar-Lezama argues that the current prevailing notion simplifies software engineering to a task similar to a student's programming assignment: receiving a small function assignment and writing code to handle it, or doing a LeetCode-style assignment. Meanwhile, the reality is far more complex: from code refactoring to optimize design, to large-scale migrations of millions of lines of code from COBOL to Java that fundamentally alter a company's technology platform.

Measurement and communication remain challenging problems.

Code optimization at an industrial scale—such as GPU core tweaking or multi-tier improvements in Chrome V8—remains difficult to assess. Current benchmarks primarily focus on small, packaged issues. The most practical metric currently available, SWE-Bench, simply requires an AI model to fix a bug on GitHub—equivalent to a low-level programming exercise, involving a few hundred lines of code, with the potential for data leaks, and ignoring a host of other real-world scenarios such as AI-assisted refactoring, human-machine programming, or rewriting high-performance systems with millions of lines of code. Until benchmarks are expanded to encompass higher-risk scenarios, measuring progress—and thus driving it—will remain an open challenge.

Furthermore, human-machine communication is also a major barrier. Lead author Alex Gu, a graduate student, stated that currently, interacting with AI is like "a thin thread of communication." When he asks AI to generate code, he often receives large, unstructured files along with a few simple and rudimentary test cases. This gap is also evident in AI's inability to effectively utilize software tools familiar to humans, such as debuggers and static analyzers.

A call to action from the community.

The authors argue that there is no magic wand solution to these problems and call for community-scale efforts: building data that reflects the actual development process of programmers (which code is retained, which is discarded, how code is refactored over time, etc.), common assessment tools for refactoring quality, patch durability, and accuracy in system transitions; and building transparent tools that allow AI to express uncertainty and invite human intervention.

Graduate student Alex Gu sees this as a “call to action” for large-scale open-source communities that no single lab can achieve. Solar-Lezama envisions progress coming from small, mutually reinforcing steps – “research results that sequentially address parts of the problem” – thereby transforming AI from a “code recommendation tool” into a true engineering partner.

“Why is this important? Software is now the foundation of finance, transportation, healthcare , and every daily activity. But human efforts to build and maintain them securely are becoming a bottleneck,” Gu shared. “An AI that can handle the heavy lifting without creating hidden errors would allow programmers to focus on creativity, strategy, and ethics. But to achieve that, we have to understand that completing a piece of code is only the easy part – the hard part lies in everything else.”

Comment (0)