Interpretation of chemical reaction yields with graph neural additive network

Youngchun Kwon; Yongsik Jung; Youn-Suk Choi; Seokho Kang

doi:10.1088/2632-2153/addfaa

Machine Learning: Science and Technology (Jan 2025)

Interpretation of chemical reaction yields with graph neural additive network

Youngchun Kwon,
Yongsik Jung,
Youn-Suk Choi,
Seokho Kang

Affiliations

Youngchun Kwon: ORCiD; Samsung Advanced Institute of Technology , Samsung Electronics Co. Ltd 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
Yongsik Jung: ORCiD; Samsung Advanced Institute of Technology , Samsung Electronics Co. Ltd 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
Youn-Suk Choi: ORCiD; Samsung Advanced Institute of Technology , Samsung Electronics Co. Ltd 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
Seokho Kang: ORCiD; Department of Industrial Engineering , Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea

DOI: https://doi.org/10.1088/2632-2153/addfaa
Journal volume & issue: Vol. 6, no. 2
p. 025054

Abstract

Read online

Prediction of chemical yields is crucial for exploring untapped chemical reactions and optimizing synthetic pathways for targeted compounds. Recently, graph neural networks have proven successful in achieving high predictive accuracy. However, they remain intrinsically black-box models, offering limited interpretability. Understanding how each reaction component contributes to the yield of a chemical reaction can help identify critical factors driving the success or failure of reactions, thereby potentially revealing opportunities for yield optimization. In this study, we present a novel method for interpretable chemical reaction yield prediction, which represents the yield of a chemical reaction as a simple summation of component-wise contributions from individual reaction components. To build an interpretable prediction model, we introduce a graph neural additive network architecture, wherein shared neural networks process individual reaction components in an input reaction while leveraging a reaction-level embedding to derive their respective contributions. The predicted yield is obtained by summing these component-wise contributions. The model is trained using a learning objective designed to effectively quantify the contributions of individual components by amplifying the influence of significant components and suppressing that of less influential components. The experimental results on benchmark datasets demonstrated that the proposed method achieved both high predictive accuracy and interpretability, making it suitable for practical use in synthetic pathway design for real-world applications.

Published in Machine Learning: Science and Technology

ISSN: 2632-2153 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2632-2153

About the journal

Abstract

Keywords