Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction

Darien Yeung; Victor Spicer; René P. Zahedi; Oleg Krokhin

Computational and Structural Biotechnology Journal (Jan 2023)

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction

Darien Yeung,
Victor Spicer,
René P. Zahedi,
Oleg Krokhin

Affiliations

Darien Yeung: Department of Biochemistry and Medical Genetics, University of Manitoba, 336 BMSB, 745 Bannatyne Avenue, Winnipeg R3E 0J9, Canada; Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada
Victor Spicer: Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada
René P. Zahedi: Department of Biochemistry and Medical Genetics, University of Manitoba, 336 BMSB, 745 Bannatyne Avenue, Winnipeg R3E 0J9, Canada; Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada; Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada; CancerCare Manitoba Research Institute, 675 McDermot Avenue, Winnipeg R3E 0V9, Canada
Oleg Krokhin: Department of Biochemistry and Medical Genetics, University of Manitoba, 336 BMSB, 745 Bannatyne Avenue, Winnipeg R3E 0J9, Canada; Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada; Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada; Corresponding author at: Department of Internal Medicine, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada.

Journal volume & issue: Vol. 21
pp. 2446 – 2453

Abstract

Read online

Peptide retention time (RT) prediction algorithms are tools to study and identify the physicochemical properties that drive the peptide-sorbent interaction. Traditional RT algorithms use multiple linear regression with manually curated parameters to determine the degree of direct contribution for each parameter and improvements to RT prediction accuracies relied on superior feature engineering. Deep learning led to a significant increase in RT prediction accuracy and automated feature engineering via chaining multiple learning modules. However, the significance and the identity of these extracted variables are not well understood due to the inherent complexity when interpreting “relationships-of-relationships” found in deep learning variables. To achieve both accuracy and interpretability simultaneously, we isolated individual modules used in deep learning and the isolated modules are the shallow learners employed for RT prediction in this work. Using a shallow convolutional neural network (CNN) and gated recurrent unit (GRU), we find that the spatial features obtained via the CNN correlate with real-world physicochemical properties namely cross-collisional sections (CCS) and variations of assessable surface area (ASA). Furthermore, we determined that the discovered parameters are “micro-coefficients” that contribute to the “macro-coefficient” – hydrophobicity. Manually embedding CCS and the variations of ASA to the GRU model yielded an R2 = 0.981 using only 525 variables and can represent 88% of the ∼110,000 tryptic peptides used in our dataset. This work highlights the feature discovery process of our shallow learners can achieve beyond traditional RT models in performance and have better interpretability when compared with the deep learning RT algorithms found in the literature.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords