Inference and Prediction Diverge in Biomedicine

Danilo Bzdok; Denis Engemann; Bertrand Thirion

Patterns (Nov 2020)

Inference and Prediction Diverge in Biomedicine

Danilo Bzdok,
Denis Engemann,
Bertrand Thirion

Affiliations

Danilo Bzdok: Mila – Quebec Artificial Intelligence Institute, Montreal, QC, Canada; Department of Biomedical Engineering, McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, School of Computer Science, McGill University, Montreal, QC, Canada; Corresponding author
Denis Engemann: INRIA Saclay, CEA, Université Paris-Saclay, bat 145, CEA Saclay, 91191 Gif-sur-Yvette, France; Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Bertrand Thirion: INRIA Saclay, CEA, Université Paris-Saclay, bat 145, CEA Saclay, 91191 Gif-sur-Yvette, France

Journal volume & issue: Vol. 1, no. 8
p. 100119

Abstract

Read online

Summary: In the 20th century, many advances in biological knowledge and evidence-based medicine were supported by p values and accompanying methods. In the early 21st century, ambitions toward precision medicine place a premium on detailed predictions for single individuals. The shift causes tension between traditional regression methods used to infer statistically significant group differences and burgeoning predictive analysis tools suited to forecast an individual's future. Our comparison applies linear models for identifying significant contributing variables and for finding the most predictive variable sets. In systematic data simulations and common medical datasets, we explored how variables identified as significantly relevant and variables identified as predictively relevant can agree or diverge. Across analysis scenarios, even small predictive performances typically coincided with finding underlying significant statistical relationships, but not vice versa. More complete understanding of different ways to define “important” associations is a prerequisite for reproducible research and advances toward personalizing medical care. The Bigger Picture: Across research communities, the analysis goals of inference and prediction are two sides of a coin. Many empirical studies leaning on statistical significance typically focus interpretation on the best p values obtained for one or a few variables. In contrast, many empirical studies dedicated to prediction are backed up by cross-validated model performance on fresh data points.In a future of single-patient prediction from big biomedical data, it may become central that modeling for inference and modeling for prediction are related but importantly different. The relevant subset of variables identified based on p values or based on predictive value can converge or diverge depending on the data scenario. We show that diverging conclusions can emerge even when the data are identical and when widespread linear models are used. Awareness of the relative strengths and weaknesses of both “data-analysis cultures” may become unavoidable in navigating between complementary goals in scientific inquiry.

Published in Patterns

ISSN: 2666-3899 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.cell.com/patterns

About the journal

Abstract

Keywords