PeerJ Computer Science (May 2024)

D-CyPre: a machine learning-based tool for accurate prediction of human CYP450 enzyme metabolic sites

  • Haolan Yang,
  • Jie Liu,
  • Kui Chen,
  • Shiyu Cong,
  • Shengnan Cai,
  • Yueting Li,
  • Zhixin Jia,
  • Hao Wu,
  • Tianyu Lou,
  • Zuying Wei,
  • Xiaoqin Yang,
  • Hongbin Xiao

DOI
https://doi.org/10.7717/peerj-cs.2040
Journal volume & issue
Vol. 10
p. e2040

Abstract

Read online Read online

The advancement of graph neural networks (GNNs) has made it possible to accurately predict metabolic sites. Despite the combination of GNNs with XGBOOST showing impressive performance, this technology has not yet been applied in the realm of metabolic site prediction. Previous metabolic site prediction tools focused on bonds and atoms, regardless of the overall molecular skeleton. This study introduces a novel tool, named D-CyPre, that amalgamates atom, bond, and molecular skeleton information via two directed message-passing neural networks (D-MPNN) to predict the metabolic sites of the nine cytochrome P450 enzymes using XGBOOST. In D-CyPre Precision Mode, the model produces fewer, but more accurate results (Jaccard score: 0.497, F1: 0.660, and precision: 0.737 in the test set). In D-CyPre Recall Mode, the model produces less accurate, but more comprehensive results (Jaccard score: 0.506, F1: 0.669, and recall: 0.720 in the test set). In the test set of 68 reactants, D-CyPre outperformed BioTransformer on all isoenzymes and CyProduct on most isoenzymes (5/9). For the subtypes where D-CyPre outperformed CyProducts, the Jaccard score and F1 scores increased by 24% and 16% in Precision Mode (4/9) and 19% and 12% in Recall Mode (5/9), respectively, relative to the second-best CyProduct. Overall, D-CyPre provides more accurate prediction results for human CYP450 enzyme metabolic sites.

Keywords