New AI model accurately predicts chemical reactions using conservation of mass

Many attempts have been made to harness the power of artificial intelligence (AI) and large language models (LLMs) to predict the outcomes of new chemical reactions. However, success has been limited, largely because these models are not tied to fundamental physical principles such as the law of conservation of mass.

Now, a team at MIT has found a way to incorporate physical constraints into reaction prediction models, significantly improving the accuracy and reliability of the results.

Image of lesson 86.png — The FlowER (Flow matching for Electron Redistribution) system allows detailed tracking of the movement of electrons, ensuring that no electrons are artificially added or lost. Photo: MIT News

The work, published on August 20 in the journal Nature, was co-authored by Joonyoung Joung (now an Assistant Professor at Kookmin University, South Korea), former software engineer Mun Hong Fong (now at Duke University), chemical engineering graduate student Nicholas Casetti, postdoctoral researcher Jordan Liles, physics student Ne Dassanayake, and lead author Connor Coley, a 1957 Career Development Professor in the Department of Chemical Engineering and the Department of Electrical Science & Engineering.

Why is reaction prediction important?

“Predicting the outcome of a reaction is a very important task,” Joung explains. For example, if you want to make a new drug, “you need to know how to synthesize it. This requires knowing which products are likely to appear” from a set of starting materials.

Previous attempts have often looked only at input and output data, ignoring intermediate steps and physical constraints such as the inability to naturally create or lose mass.

Joung points out that, while LLMs like ChatGPT have had some success in research, they lack a mechanism to ensure that their results follow the laws of physics. “Without conserving the ‘tokens’ (which represent atoms), LLMs will arbitrarily create or destroy atoms in the reaction,” he says. “This is more like alchemy than science.”

FlowerER Solution: Based on old platform, applied to new technology

To overcome this, the team used a 1970s method developed by chemist Ivar Ugi – the bond-electron matrix – to represent electrons in a reaction.

Based on that, they developed the FlowER (Flow matching for Electron Redistribution) program, which allows detailed tracking of the movement of electrons, ensuring that no electrons are artificially added or lost.

This matrix uses a non-zero value to represent a bond or a pair of free electrons, and zero for the opposite. “This allows us to conserve both the atom and the electron,” Fong explains. This is key to incorporating mass conservation into the model.

Early but promising evidence

According to Coley, the current system is just a demonstration—a proof-of-concept that shows the “flow matching” method is well suited to predicting chemical reactions.

Despite being trained with data from over a million chemical reactions (collected from the US Patent Office), the database still lacks metal- and catalytic-based reactions.

“We’re excited that the system can reliably predict the reaction mechanism,” Coley said. “It conserves mass, it conserves electrons, but there are certainly ways to expand and improve the robustness in the coming years.”

The model is now publicly available on GitHub. Coley hopes it will be a useful tool for assessing reactivity and building response maps.

Open data sources and wide application potential

“We made everything public—from the model, to the data, to a previous dataset built by Joung that detailed the known mechanistic steps of the reaction,” Fong said.

According to the team, FlowER can match or exceed existing methods in finding standard mechanisms, while also generalizing to previously unseen classes of reactions. Potential applications range from pharmaceutical chemistry, materials discovery , fire research, atmospheric chemistry, to electrochemical systems.

Compared to other systems, Coley notes: “With the architectural choice we use, we achieve a quantum leap in validity and integrity, while maintaining or slightly improving accuracy.”

What’s unique, says Coley, is that the model doesn’t “invent” mechanisms, but rather infers them based on experimental data from patent literature. “We’re extracting mechanisms from experimental data—something that’s never been done and shared at this scale.”

Next step

The team plans to expand the model’s understanding of metals and catalysis. “We’ve only scratched the surface,” Coley admits.

In the long term, he believes the system could help discover new complex reactions, as well as shed light on previously unknown mechanisms. “The long-term potential is huge, but this is just the beginning.”

The research was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium and the US National Science Foundation (NSF).

(Source: MIT)

Source: https://vietnamnet.vn/moi-hinh-ai-moi-du-doan-phan-ung-hoa-hoc-chinh-xac-nho-bao-toan-khoi-luong-2444232.html