Retrosynthesis prediction with an iterative string editing model

Yuqiang Han; Xiaoyang Xu; Chang-Yu Hsieh; Keyan Ding; Hongxia Xu; Renjun Xu; Tingjun Hou; Qiang Zhang; Huajun Chen

doi:10.1038/s41467-024-50617-1

Nature Communications (Jul 2024)

Retrosynthesis prediction with an iterative string editing model

Yuqiang Han,
Xiaoyang Xu,
Chang-Yu Hsieh,
Keyan Ding,
Hongxia Xu,
Renjun Xu,
Tingjun Hou,
Qiang Zhang,
Huajun Chen

Affiliations

Yuqiang Han: College of Computer Science and Technology, Zhejiang University
Xiaoyang Xu: Polytechnic Institute, Zhejiang University
Chang-Yu Hsieh: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Keyan Ding: College of Computer Science and Technology, Zhejiang University
Hongxia Xu: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Renjun Xu: College of Computer Science and Technology, Zhejiang University
Tingjun Hou: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Qiang Zhang: College of Computer Science and Technology, Zhejiang University
Huajun Chen: College of Computer Science and Technology, Zhejiang University

DOI: https://doi.org/10.1038/s41467-024-50617-1
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhibiting unsatisfactory performance and limited diversity. As chemical reactions typically induce local molecular changes, reactants and products often overlap significantly. Inspired by this fact, we propose reframing single-step retrosynthesis prediction as a molecular string editing task, iteratively refining target molecule strings to generate precursor compounds. Our proposed approach involves a fragment-based generative editing model that uses explicit sequence editing operations. Additionally, we design an inference module with reposition sampling and sequence augmentation to enhance both prediction accuracy and diversity. Extensive experiments demonstrate that our model generates high-quality and diverse results, achieving superior performance with a promising top-1 accuracy of 60.8% on the standard benchmark dataset USPTO-50 K.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal