Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

Yu Wang; Chao Pang; Yuzhe Wang; Junru Jin; Jingjie Zhang; Xiangxiang Zeng; Ran Su; Quan Zou; Leyi Wei

doi:10.1038/s41467-023-41698-5

Nature Communications (Oct 2023)

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

Yu Wang,
Chao Pang,
Yuzhe Wang,
Junru Jin,
Jingjie Zhang,
Xiangxiang Zeng,
Ran Su,
Quan Zou,
Leyi Wei

Affiliations

Yu Wang: School of Software, Shandong University
Chao Pang: School of Software, Shandong University
Yuzhe Wang: School of Software, Shandong University
Junru Jin: School of Software, Shandong University
Jingjie Zhang: School of Software, Shandong University
Xiangxiang Zeng: College of Computer Science and Electronic Engineering, Hunan University
Ran Su: College of Intelligence and Computing, Tianjin University
Quan Zou: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
Leyi Wei: School of Software, Shandong University

DOI: https://doi.org/10.1038/s41467-023-41698-5
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Automating retrosynthesis with artificial intelligence expedites organic chemistry research in digital laboratories. However, most existing deep-learning approaches are hard to explain, like a “black box” with few insights. Here, we propose RetroExplainer, formulizing the retrosynthesis task into a molecular assembly process, containing several retrosynthetic actions guided by deep learning. To guarantee a robust performance of our model, we propose three units: a multi-sense and multi-scale Graph Transformer, structure-aware contrastive learning, and dynamic adaptive multi-task learning. The results on 12 large-scale benchmark datasets demonstrate the effectiveness of RetroExplainer, which outperforms the state-of-the-art single-step retrosynthesis approaches. In addition, the molecular assembly process renders our model with good interpretability, allowing for transparent decision-making and quantitative attribution. When extended to multi-step retrosynthesis planning, RetroExplainer has identified 101 pathways, in which 86.9% of the single reactions correspond to those already reported in the literature. As a result, RetroExplainer is expected to offer valuable insights for reliable, high-throughput, and high-quality organic synthesis in drug development.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal