Challenges in Lipidomics Biomarker Identification: Avoiding the Pitfalls and Improving Reproducibility

Johanna von Gerichten; Kyle Saunders; Melanie J. Bailey; Lee A. Gethings; Anthony Onoja; Nophar Geifman; Matt Spick

doi:10.3390/metabo14080461

Metabolites (Aug 2024)

Challenges in Lipidomics Biomarker Identification: Avoiding the Pitfalls and Improving Reproducibility

Johanna von Gerichten,
Kyle Saunders,
Melanie J. Bailey,
Lee A. Gethings,
Anthony Onoja,
Nophar Geifman,
Matt Spick

Affiliations

Johanna von Gerichten: School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
Kyle Saunders: School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
Melanie J. Bailey: School of Chemistry and Chemical Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
Lee A. Gethings: Waters Corporation, Wilmslow SK9 4AX, UK
Anthony Onoja: School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
Nophar Geifman: School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
Matt Spick: School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK

DOI: https://doi.org/10.3390/metabo14080461
Journal volume & issue: Vol. 14, no. 8
p. 461

Abstract

Read online

Identification of features with high levels of confidence in liquid chromatography–mass spectrometry (LC–MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in biomarker identification. In this work, we illustrate the reproducibility gap for two open-access lipidomics platforms, MS DIAL and Lipostar, finding just 14.0% identification agreement when analyzing identical LC–MS spectra using default settings. Whilst the software platforms performed more consistently using fragmentation data, agreement was still only 36.1% for MS2 spectra. This highlights the critical importance of validation across positive and negative LC–MS modes, as well as the manual curation of spectra and lipidomics software outputs, in order to reduce identification errors caused by closely related lipids and co-elution issues. This curation process can be supplemented by data-driven outlier detection in assessing spectral outputs, which is demonstrated here using a novel machine learning approach based on support vector machine regression combined with leave-one-out cross-validation. These steps are essential to reduce the frequency of false positive identifications and close the reproducibility gap, including between software platforms, which, for downstream users such as bioinformaticians and clinicians, can be an underappreciated source of biomarker identification errors.

Published in Metabolites

ISSN: 2218-1989 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.mdpi.com/journal/metabolites

About the journal

Abstract

Keywords