Improving the performance of models for one-step retrosynthesis through re-ranking

Min Htoo Lin; Zhengkai Tu; Connor W. Coley

doi:10.1186/s13321-022-00594-8

Journal of Cheminformatics (Mar 2022)

Improving the performance of models for one-step retrosynthesis through re-ranking

Min Htoo Lin,
Zhengkai Tu,
Connor W. Coley

Affiliations

Min Htoo Lin: Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University
Zhengkai Tu: Computational Science and Engineering, Massachusetts Institute of Technology
Connor W. Coley: Department of Chemical Engineering, Massachusetts Institute of Technology

DOI: https://doi.org/10.1186/s13321-022-00594-8
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Graphical Abstract

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords