Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Dejun Jiang; Zhenxing Wu; Chang-Yu Hsieh; Guangyong Chen; Ben Liao; Zhe Wang; Chao Shen; Dongsheng Cao; Jian Wu; Tingjun Hou

doi:10.1186/s13321-020-00479-8

Journal of Cheminformatics (Feb 2021)

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Dejun Jiang,
Zhenxing Wu,
Chang-Yu Hsieh,
Guangyong Chen,
Ben Liao,
Zhe Wang,
Chao Shen,
Dongsheng Cao,
Jian Wu,
Tingjun Hou

Affiliations

Dejun Jiang: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Zhenxing Wu: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Chang-Yu Hsieh: Tencent Quantum Laboratory Tencent
Guangyong Chen: Shenzhen Institutes of Advanced Technology
Ben Liao: Tencent Quantum Laboratory Tencent
Zhe Wang: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Chao Shen: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University
Dongsheng Cao: Xiangya School of Pharmaceutical Sciences, Central South University
Jian Wu: College of Computer Science and Technology, Zhejiang University
Tingjun Hou: Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University

DOI: https://doi.org/10.1186/s13321-020-00479-8
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 23

Abstract

Read online

Abstract Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords