Statistical Workflow for Feature Selection in Human Metabolomics Data

Joseph Antonelli; Brian  L. Claggett; Mir Henglin; Andy Kim; Gavin Ovsak; Nicole Kim; Katherine Deng; Kevin Rao; Octavia Tyagi; Jeramie  D. Watrous; Kim  A. Lagerborg; Pavel  V. Hushcha; Olga  V. Demler; Samia Mora; Teemu  J. Niiranen; Alexandre  C. Pereira; Mohit Jain; Susan Cheng

doi:10.3390/metabo9070143

Metabolites (Jul 2019)

Statistical Workflow for Feature Selection in Human Metabolomics Data

Joseph Antonelli,
Brian L. Claggett,
Mir Henglin,
Andy Kim,
Gavin Ovsak,
Nicole Kim,
Katherine Deng,
Kevin Rao,
Octavia Tyagi,
Jeramie D. Watrous,
Kim A. Lagerborg,
Pavel V. Hushcha,
Olga V. Demler,
Samia Mora,
Teemu J. Niiranen,
Alexandre C. Pereira,
Mohit Jain,
Susan Cheng

Affiliations

Joseph Antonelli: Department of Statistics, University of Florida, Gainesville, FL 32611, USA
Brian L. Claggett: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Mir Henglin: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Andy Kim: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Gavin Ovsak: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Nicole Kim: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Katherine Deng: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Kevin Rao: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Octavia Tyagi: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Jeramie D. Watrous: Departments of Medicine & Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
Kim A. Lagerborg: Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
Pavel V. Hushcha: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Olga V. Demler: Preventive Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Samia Mora: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
Teemu J. Niiranen: National Institute for Health and Welfare, FI 00271 Helsinki, Finland
Alexandre C. Pereira: Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
Mohit Jain: Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
Susan Cheng: Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA

DOI: https://doi.org/10.3390/metabo9070143
Journal volume & issue: Vol. 9, no. 7
p. 143

Abstract

Read online

High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations.

Published in Metabolites

ISSN: 2218-1989 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.mdpi.com/journal/metabolites

About the journal

Abstract

Keywords