A Case Report of Switching from Specific Vendor-Based to R-Based Pipelines for Untargeted LC-MS Metabolomics
Álvaro Fernández-Ochoa,
Rosa Quirantes-Piné,
Isabel Borrás-Linares,
María de la Luz Cádiz-Gurrea,
PRECISESADS Clinical Consortium,
Marta E. Alarcón Riquelme,
Carl Brunius,
Antonio Segura-Carretero
Affiliations
Álvaro Fernández-Ochoa
Department of Analytical Chemistry, Faculty of Sciences, University of Granada, Av Fuentenueva s/n, 18071 Granada, Spain
Rosa Quirantes-Piné
Research and Development of Functional Food Centre (CIDAF), Health Science Technological Park, Av del Conocimiento, No. 37, s/n, 18016 Granada, Spain
Isabel Borrás-Linares
Research and Development of Functional Food Centre (CIDAF), Health Science Technological Park, Av del Conocimiento, No. 37, s/n, 18016 Granada, Spain
María de la Luz Cádiz-Gurrea
Department of Analytical Chemistry, Faculty of Sciences, University of Granada, Av Fuentenueva s/n, 18071 Granada, Spain
PRECISESADS Clinical Consortium
Membership of the PRECISESADS Clinical Consortium is provided in the Acknowledgment section.
Marta E. Alarcón Riquelme
Centre for Genomics and Oncological Research (GENYO), Pfizer-University of Granada-Andalusian Government, Health Science Technological Park, Av de la Ilustración 114, 18016 Granada, Spain
Carl Brunius
Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
Antonio Segura-Carretero
Department of Analytical Chemistry, Faculty of Sciences, University of Granada, Av Fuentenueva s/n, 18071 Granada, Spain
Data pre-processing of the LC-MS data is a critical step in untargeted metabolomics studies in order to achieve correct biological interpretations. Several tools have been developed for pre-processing, and these can be classified into either commercial or open source software. This case report aims to compare two specific methodologies, Agilent Profinder vs. R pipeline, for a metabolomic study with a large number of samples. Specifically, 369 plasma samples were analyzed by HPLC-ESI-QTOF-MS. The collected data were pre-processed by both methodologies and later evaluated by several parameters (number of peaks, degree of missingness, quality of the peaks, degree of misalignments, and robustness in multivariate models). The vendor software was characterized by ease of use, friendly interface and good quality of the graphs. The open source methodology could more effectively correct the drifts due to between and within batch effects. In addition, the evaluated statistical methods achieved better classification results with higher parsimony for the open source methodology, indicating higher data quality. Although both methodologies have strengths and weaknesses, the open source methodology seems to be more appropriate for studies with a large number of samples mainly due to its higher capacity and versatility that allows combining different packages, functions, and methods in a single environment.