Recommendations for improving statistical inference in population genomics

Parul Johri; Charles F. Aquadro; Mark Beaumont; Brian Charlesworth; Laurent Excoffier; Adam Eyre-Walker; Peter D. Keightley; Michael Lynch; Gil McVean; Bret A. Payseur; Susanne P. Pfeifer; Wolfgang Stephan; Jeffrey D. Jensen

PLoS Biology (May 2022)

Recommendations for improving statistical inference in population genomics

Parul Johri,
Charles F. Aquadro,
Mark Beaumont,
Brian Charlesworth,
Laurent Excoffier,
Adam Eyre-Walker,
Peter D. Keightley,
Michael Lynch,
Gil McVean,
Bret A. Payseur,
Susanne P. Pfeifer,
Wolfgang Stephan,
Jeffrey D. Jensen

Affiliations

Parul Johri
Charles F. Aquadro
Mark Beaumont
Brian Charlesworth
Laurent Excoffier
Adam Eyre-Walker
Peter D. Keightley
Michael Lynch
Gil McVean
Bret A. Payseur
Susanne P. Pfeifer
Wolfgang Stephan
Jeffrey D. Jensen

Journal volume & issue: Vol. 20, no. 5

Abstract

Read online

The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties. Genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted, leading to some questionable use of statistical models. In this Consensus View, the authors provide recommendations for current best practices in population genomic data analysis and highlight areas of statistical inference and theory that are in need of further attention.

Published in PLoS Biology

ISSN: 1544-9173 (Print); 1545-7885 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/plosbiology/

About the journal