Research Topics

Accelerated reaction analysis by GRRM with AI and algorithms.

Face Photo
Nakao, Atsuyuki Doctoral course
Belongs
Graduate School of Frontier Sciences, The University of Tokyo
Keywords
path finding algorithm machine learning graph neural network quantum chemical calculation

A network of chemical reactions explored by GRRM is extremely huge, and as the size of a reaction system increases, its exploration requires a huge computational cost. Informatics technologies such as AI and algorithms are applied to efficiently explore this vast space according to research objectives.

Background: GRRM-based Reaction Analysis and Importance of Algorithm for Constructing a Reaction Path Network.

Chemical reactions are nanoscale processes involving chemical bond formation and cleavage between atoms. GRRM, which is a core technology of this project, is an excellent tool for understanding these nanoscale processes that are difficult to study experimentally. GRRM constructs a network of chemical reactions called a reaction path network (see figure below) based on Artificial Force Induced Reaction (AFIR) method and quantum chemical calculations. GRRM can exhaustively explore this network, but in practical applications, it is impossible to obtain a complete network because the size of the network grows exponentially with the number of atoms in the reaction system. Therefore, it is necessary to focus exploration on only areas that are necessary for reaction analysis.

Reaction path networks constructed by GRRM have a variety of applications. For example, it is used to analyze reaction mechanisms, predict unknown reactions, and search for reactants necessary to synthesize desired compounds. The area of the network that needs to be explored for each of purposes is different, and therefore, it is necessary to control exploration according to the purpose. If the exploration area can be controlled appropriately, it will be possible to efficiently construct a reaction path network that will yield useful information. This allows for larger reaction systems or more accurate quantum chemical calculation-based exploration with limited computational resources, making GRRM-based reaction analysis more useful.

Figure 1. Nanoscale understanding of chemical reactions, and a reaction path network.

Approach: Integration of Informatics and GRRM.

In this study, informatics was applied to improve an efficiency of reaction path network construction. Formulating this problem as an informatics problem, it is a problem of sequential exploration on a reaction path network, which is an unknown graph structure, and such exploration problems on graphs have been studied and applied in a variety of fields. The application of these findings to our problem is very useful. For example, a problem of analyzing a mechanism of an experimentally obtained reaction corresponds to a problem of finding a path on a graph that leads from a node corresponding to a reactant to a node corresponding to a product. Path finding algorithms for solving such problems have been actively studied in train route navigation, robot control, and other areas. AI technology, which has made remarkable progress in various fields in recent years, can also be expected to contribute to improved exploration efficiency. If knowledge of various reaction paths accumulated through this project can be extracted by AI technology and utilized in explorations, it will help to conduct more efficient explorations.

A standard reaction path network exploration by GRRM is performed by the cycle shown in the figure below. In each cycle, a node (structure) in the network is selected and AFIR calculations are performed on the structure. This calculation applies an artificial force to the structure, causing a transition to an adjacent structure. The network is then expanded by adding the newly generated structures. Given a starting structure for exploration, this cycle is repeated for sequential reaction path network exploration. It is important to note that a selection order of which the artificial force to apply to which structure is important to control exploration area of the network. Informatics technology supports this selection for efficient exploration in this study.

Figure 2. The flow chart of a reaction path network exploration by GRRM.

Research Results: Reaction Path Network Exploration Based on Pathfinding Algorithm and Machine Learning.

In this study, we proposed two different methods depending on purposes of chemical reaction analysis. The first method, RRT/SC-AFIR, explores reaction paths between reactants and products of a pre-specified reaction. While it is relatively easy to experimentally identify reactants and products in chemical reactions, it is not easy to know experimentally an actual reaction mechanism such as order of chemical bonds formation and cleavage. By exploring reaction paths on a reaction path network, this method can efficiently propose reaction mechanisms based on quantum chemical calculations. As mentioned above, this problem corresponds to a path finding problem on a graph. In this study, we proposed a method to improve efficiency of this type of exploration by using Rapidly-exploring random tree (RRT), a path finding method often used in robot control. In addition to information on reaction kinetics calculated based on quantum chemical calculations, similarities between structures can be used to quickly find reaction paths connecting reactants and products.

The second method GNN/SC-AFIR uses machine learning to select how to apply the artificial force of AFIR calculations. Although various GRRM-based reaction analyses are currently being conducted, each exploration is independent, and it has been difficult to directly apply the findings from other explorations to the next exploration. If those findings can be effectively utilized, it is expected that a systematic database of reaction path networks can be used to greatly enhance efficiency in the exploration of new reaction systems. GNN/SC-AFIR uses a model based on graph neural networks, which is one of the deep neural networks. The proposed machine learning model can flexibly handle two types of inputs necessary to control exploration of GRRM: the three-dimensional structure of a reaction system and an artificial force used in AFIR calculation. This method was validated by constructing a reaction network containing a greater variety of compounds. As shown in the figure below, the method was able to construct a network containing more than twice as many diverse structures. Reaction path networks that contain more structures allow for considering more potential reactions. In other words, machine learning enabled us to efficiently construct a more useful reaction path network.

Figure 3. Structural diversity enhancement by GNN/SC-AFIR and reaction path network constructed by each method.

Outlook: Quantum Chemistry, Algorithms, and Database-Based Acceleration of Reaction Discovery.

While GRRM is an excellent tool to help discover reactions based on quantum chemical calculations, a purpose-specific exploration algorithm is essential to realize its full potential. By combining quantum chemistry with algorithms optimized for exploratory intent and AI techniques for effective use of known databases, it is expected that a wider variety of reaction analysis and discover based on quantum chemical calculations will be possible in the future.