Systematic Evaluation of Normalization Methods for Glycomics Data Based on Performance of Network Inference
Elisa Benedetti,
Nathalie Gerstner,
Maja Pučić-Baković,
Toma Keser,
Karli R. Reiding,
L. Renee Ruhaak,
Tamara Štambuk,
Maurice H.J. Selman,
Igor Rudan,
Ozren Polašek,
Caroline Hayward,
Marian Beekman,
Eline Slagboom,
Manfred Wuhrer,
Malcolm G. Dunlop,
Gordan Lauc,
Jan Krumsiek
Affiliations
Elisa Benedetti
Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10022, USA
Nathalie Gerstner
Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany
Maja Pučić-Baković
Genos Glycoscience Research Laboratory, 10000 Zagreb, Croatia
Toma Keser
Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia
Karli R. Reiding
Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, 3584 CH Utrecht, The Netherlands
L. Renee Ruhaak
Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
Tamara Štambuk
Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia
Maurice H.J. Selman
Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, 3584 CH Utrecht, The Netherlands
Igor Rudan
Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK
Ozren Polašek
Medical School, University of Split, 21000 Split, Croatia
Caroline Hayward
Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
Marian Beekman
Section of Molecular Epidemiology, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
Eline Slagboom
Section of Molecular Epidemiology, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
Manfred Wuhrer
Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
Malcolm G. Dunlop
Colon Cancer Genetics Group, Institute of Genetics and Molecular Medicine, University of Edinburgh and Medical Research Council Human Genetics Unit, Edinburgh EH8 9YL, UK
Gordan Lauc
Genos Glycoscience Research Laboratory, 10000 Zagreb, Croatia
Jan Krumsiek
Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10022, USA
Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography-ElectroSpray Ionization-Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization-Furier Transform Ion Cyclotron Resonance-Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the ‘Probabilistic Quotient’ method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.