Patterns (2020-11-01)

Inference and Prediction Diverge in Biomedicine

  • Danilo Bzdok,
  • Denis Engemann,
  • Bertrand Thirion

Journal volume & issue
Vol. 1, no. 8
p. 100119


Read online

Summary: In the 20th century, many advances in biological knowledge and evidence-based medicine were supported by p values and accompanying methods. In the early 21st century, ambitions toward precision medicine place a premium on detailed predictions for single individuals. The shift causes tension between traditional regression methods used to infer statistically significant group differences and burgeoning predictive analysis tools suited to forecast an individual's future. Our comparison applies linear models for identifying significant contributing variables and for finding the most predictive variable sets. In systematic data simulations and common medical datasets, we explored how variables identified as significantly relevant and variables identified as predictively relevant can agree or diverge. Across analysis scenarios, even small predictive performances typically coincided with finding underlying significant statistical relationships, but not vice versa. More complete understanding of different ways to define “important” associations is a prerequisite for reproducible research and advances toward personalizing medical care. The Bigger Picture: Across research communities, the analysis goals of inference and prediction are two sides of a coin. Many empirical studies leaning on statistical significance typically focus interpretation on the best p values obtained for one or a few variables. In contrast, many empirical studies dedicated to prediction are backed up by cross-validated model performance on fresh data points.In a future of single-patient prediction from big biomedical data, it may become central that modeling for inference and modeling for prediction are related but importantly different. The relevant subset of variables identified based on p values or based on predictive value can converge or diverge depending on the data scenario. We show that diverging conclusions can emerge even when the data are identical and when widespread linear models are used. Awareness of the relative strengths and weaknesses of both “data-analysis cultures” may become unavoidable in navigating between complementary goals in scientific inquiry.