MetaRF: attention-based random forest for reaction yield prediction with a few trails

Kexin Chen; Guangyong Chen; Junyou Li; Yuansheng Huang; Ercheng Wang; Tingjun Hou; Pheng-Ann Heng

doi:10.1186/s13321-023-00715-x

Journal of Cheminformatics (Apr 2023)

MetaRF: attention-based random forest for reaction yield prediction with a few trails

Kexin Chen,
Guangyong Chen,
Junyou Li,
Yuansheng Huang,
Ercheng Wang,
Tingjun Hou,
Pheng-Ann Heng

Affiliations

Kexin Chen: Department of Computer Science and Engineering, The Chinese University of Hong Kong
Guangyong Chen: Zhejiang Lab
Junyou Li: Zhejiang Lab
Yuansheng Huang: College of Pharmaceutical Sciences, Zhejiang University
Ercheng Wang: Zhejiang Lab
Tingjun Hou: College of Pharmaceutical Sciences, Zhejiang University
Pheng-Ann Heng: Department of Computer Science and Engineering, The Chinese University of Hong Kong

DOI: https://doi.org/10.1186/s13321-023-00715-x
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology’s top 10 high-yield reactions is relatively close to the results of ideal yield selection.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords