PredCoffee: A binary classification approach specifically for coffee odor
Yi He,
Ruirui Huang,
Ruoyu Zhang,
Fei He,
Lu Han,
Weiwei Han
Affiliations
Yi He
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, 2699 Qianjin Street, Changchun 130012, China
Ruirui Huang
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, 2699 Qianjin Street, Changchun 130012, China
Ruoyu Zhang
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, 2699 Qianjin Street, Changchun 130012, China
Fei He
Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; Corresponding author
Lu Han
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, 2699 Qianjin Street, Changchun 130012, China; Corresponding author
Weiwei Han
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, 2699 Qianjin Street, Changchun 130012, China; Corresponding author
Summary: Compared to traditional methods, using machine learning to assess or predict the odor of molecules can save costs in various aspects. Our research aims to collect molecules with coffee odor and summarize the regularity of these molecules, ultimately creating a binary classifier that can determine whether a molecule has a coffee odor. In this study, a total of 371 coffee-odor molecules and 9,700 non-coffee-odor molecules were collected. The Knowledge-guided Pre-training of Graph Transformer (KPGT), support vector machine (SVM), random forest (RF), multi-layer perceptron (MLP), and message-passing neural networks (MPNN) were used to train the data. The model with the best performance was selected as the basis of the predictor. The prediction accuracy value of the KPGT model exceeded 0.84 and the predictor has been deployed as a webserver PredCoffee.