Machine learning framework to extract the biomarker potential of plasma IgG N-glycans towards disease risk stratification

Konstantinos Flevaris; Joseph Davies; Shoh Nakai; Frano Vučković; Gordan Lauc; Malcolm G. Dunlop; Cleo Kontoravdi

Computational and Structural Biotechnology Journal (Dec 2024)

Machine learning framework to extract the biomarker potential of plasma IgG N-glycans towards disease risk stratification

Konstantinos Flevaris,
Joseph Davies,
Shoh Nakai,
Frano Vučković,
Gordan Lauc,
Malcolm G. Dunlop,
Cleo Kontoravdi

Affiliations

Konstantinos Flevaris: Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom; Corresponding authors.
Joseph Davies: Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
Shoh Nakai: Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
Frano Vučković: Genos Glycoscience Research Laboratory, Zagreb 10000, Croatia
Gordan Lauc: Genos Glycoscience Research Laboratory, Zagreb 10000, Croatia; Department of Biochemistry and Molecular Biology, Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
Malcolm G. Dunlop: Colon Cancer Genetics Group, Institute of Genetics and Cancer, Cancer Research UK Scotland Centre, University of Edinburgh and Medical Research Council Human Genetics Unit, Edinburgh, United Kingdom
Cleo Kontoravdi: Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom; Corresponding authors.

Journal volume & issue: Vol. 23
pp. 1234 – 1243

Abstract

Read online

Effective management of chronic diseases and cancer can greatly benefit from disease-specific biomarkers that enable informative screening and timely diagnosis. IgG N-glycans found in human plasma have the potential to be minimally invasive disease-specific biomarkers for all stages of disease development due to their plasticity in response to various genetic and environmental stimuli. Data analysis and machine learning (ML) approaches can assist in harnessing the potential of IgG glycomics towards biomarker discovery and the development of reliable predictive tools for disease screening. This study proposes an ML-based N-glycomic analysis framework that can be employed to build, optimise, and evaluate multiple ML pipelines to stratify patients based on disease risk in an interpretable manner. To design and test this framework, a published colorectal cancer (CRC) dataset from the Study of Colorectal Cancer in Scotland (SOCCS) cohort (1999–2006) was used. In particular, among the different pipelines tested, an XGBoost-based ML pipeline, which was tuned using multi-objective optimisation, calibrated using an inductive Venn-Abers predictor (IVAP), and evaluated via a nested cross-validation (NCV) scheme, achieved a mean area under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.771 when classifying between age-, and sex-matched healthy controls and CRC patients. This performance suggests the potential of using the relative abundance of IgG N-glycans to define populations at elevated CRC risk who merit investigation or surveillance. Finally, the IgG N-glycans that highly impact CRC classification decisions were identified using a global model-agnostic interpretability technique, namely Accumulated Local Effects (ALE). We envision that open-source computational frameworks, such as the one presented herein, will be useful in supporting the translation of glycan-based biomarkers into clinical applications.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords