Metabolites (Sep 2021)
A New Pipeline for the Normalization and Pooling of Metabolomics Data
- Vivian Viallon,
- Mathilde His,
- Sabina Rinaldi,
- Marie Breeur,
- Audrey Gicquiau,
- Bertrand Hemon,
- Kim Overvad,
- Anne Tjønneland,
- Agnetha Linn Rostgaard-Hansen,
- Joseph A. Rothwell,
- Lucie Lecuyer,
- Gianluca Severi,
- Rudolf Kaaks,
- Theron Johnson,
- Matthias B. Schulze,
- Domenico Palli,
- Claudia Agnoli,
- Salvatore Panico,
- Rosario Tumino,
- Fulvio Ricceri,
- W. M. Monique Verschuren,
- Peter Engelfriet,
- Charlotte Onland-Moret,
- Roel Vermeulen,
- Therese Haugdahl Nøst,
- Ilona Urbarova,
- Raul Zamora-Ros,
- Miguel Rodriguez-Barranco,
- Pilar Amiano,
- José Maria Huerta,
- Eva Ardanaz,
- Olle Melander,
- Filip Ottoson,
- Linda Vidman,
- Matilda Rentoft,
- Julie A. Schmidt,
- Ruth C. Travis,
- Elisabete Weiderpass,
- Mattias Johansson,
- Laure Dossus,
- Mazda Jenab,
- Marc J. Gunter,
- Justo Lorenzo Bermejo,
- Dominique Scherer,
- Reza M. Salek,
- Pekka Keski-Rahkonen,
- Pietro Ferrari
Affiliations
- Vivian Viallon
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Mathilde His
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Sabina Rinaldi
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Marie Breeur
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Audrey Gicquiau
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Bertrand Hemon
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Kim Overvad
- Department of Public Health, Aarhus University Bartholins Alle 2, DK-8000 Aarhus, Denmark
- Anne Tjønneland
- Danish Cancer Society Research Center, DK-2100 Copenhagen, Denmark
- Agnetha Linn Rostgaard-Hansen
- Danish Cancer Society Research Center, DK-2100 Copenhagen, Denmark
- Joseph A. Rothwell
- UVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, France
- Lucie Lecuyer
- UVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, France
- Gianluca Severi
- UVSQ, Inserm, CESP U1018, “Exposome and Heredity” Team, Université Paris-Saclay, Gustave Roussy, 94800 Villejuif, France
- Rudolf Kaaks
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Theron Johnson
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Matthias B. Schulze
- Department of Molecular Epidemiology, German Institute of Human Nutrition Potsdam Rehbruecke, Arthur-Scheunert-Allee 114-116, 14558 Nuthetal, Germany
- Domenico Palli
- Cancer Risk Factors and Life-Style Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), 50139 Florence, Italy
- Claudia Agnoli
- Epidemiology and Prevention Unit Department of Research, Fondazione IRCCS—Istituto Nazionale dei Tumori, 20133 Milan, Italy
- Salvatore Panico
- Dipartimento di Medicina Clinica e Chirurgia, Federico II University, 80131 Naples, Italy
- Rosario Tumino
- Cancer Registry and Histopathology Department, Provincial Health Authority (ASP 7), 97100 Ragusa, Italy
- Fulvio Ricceri
- Department of Clinical and Biological Sciences, University of Turin, 10043 Orbassano, Italy
- W. M. Monique Verschuren
- National Institute for Public Health and the Environment, Centre for Nutrition, Prevention and Health Services, Antonie van Leeuwenhoeklaan 9, 3721 MA Bilthoven, The Netherlands
- Peter Engelfriet
- National Institute for Public Health and the Environment, Centre for Nutrition, Prevention and Health Services, Antonie van Leeuwenhoeklaan 9, 3721 MA Bilthoven, The Netherlands
- Charlotte Onland-Moret
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, The Netherlands
- Roel Vermeulen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, The Netherlands
- Therese Haugdahl Nøst
- Department of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, P.O. Box 6050, 9037 Tromsø, Norway
- Ilona Urbarova
- Department of Community Medicine, Faculty of Health Sciences, UiT The Arctic University of Norway, P.O. Box 6050, 9037 Tromsø, Norway
- Raul Zamora-Ros
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Programme, Catalan Institute of Oncology, Bellvitge Biomedical Research Institute (IDIBELL), 08908 L’Hospitalet de Llobregat, Spain
- Miguel Rodriguez-Barranco
- Escuela Andaluza de Salud Pública (EASP), 18011 Granada, Spain
- Pilar Amiano
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
- José Maria Huerta
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
- Eva Ardanaz
- Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
- Olle Melander
- Department of Clincal Sciences, Lund University, SE-21 428 Malmö, Sweden
- Filip Ottoson
- Department of Immunotechnology, Lund University, SE-22 100 Lund, Sweden
- Linda Vidman
- Department of Radiation Sciences, Oncology, Umeå University, SE-901 87 Umeå, Sweden
- Matilda Rentoft
- Department of Radiation Sciences, Oncology, Umeå University, SE-901 87 Umeå, Sweden
- Julie A. Schmidt
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
- Ruth C. Travis
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
- Elisabete Weiderpass
- International Agency for Research on Cancer, World Health Organization, 69008 Lyon, France
- Mattias Johansson
- Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Laure Dossus
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Mazda Jenab
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Marc J. Gunter
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Justo Lorenzo Bermejo
- Statistical Genetics Group, Institute of Medical Biometry, University of Heidelberg, 69120 Heidelberg, Germany
- Dominique Scherer
- Statistical Genetics Group, Institute of Medical Biometry, University of Heidelberg, 69120 Heidelberg, Germany
- Reza M. Salek
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Pekka Keski-Rahkonen
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- Pietro Ferrari
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), 69008 Lyon, France
- DOI
- https://doi.org/10.3390/metabo11090631
- Journal volume & issue
-
Vol. 11,
no. 9
p. 631
Abstract
Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
Keywords