PLoS Computational Biology (Mar 2024)

Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices.

  • Maximilian M Mandl,
  • Sabine Hoffmann,
  • Sebastian Bieringer,
  • Anna E Jacob,
  • Marie Kraft,
  • Simon Lemster,
  • Anne-Laure Boulesteix

DOI
https://doi.org/10.1371/journal.pcbi.1011936
Journal volume & issue
Vol. 20, no. 3
p. e1011936

Abstract

Read online

Throughout their education and when reading the scientific literature, students may get the impression that there is a unique and correct analysis strategy for every data analysis task and that this analysis strategy will always yield a significant and noteworthy result. This expectation conflicts with a growing realization that there is a multiplicity of possible analysis strategies in empirical research, which will lead to overoptimism and nonreplicable research findings if it is combined with result-dependent selective reporting. Here, we argue that students are often ill-equipped for real-world data analysis tasks and unprepared for the dangers of selectively reporting the most promising results. We present a seminar course intended for advanced undergraduates and beginning graduate students of data analysis fields such as statistics, data science, or bioinformatics that aims to increase the awareness of uncertain choices in the analysis of empirical data and present ways to deal with these choices through theoretical modules and practical hands-on sessions.