Dataset decay and the problem of sequential analyses on open datasets

William Hedley Thompson; Jessey Wright; Patrick G Bissett; Russell A Poldrack

doi:10.7554/eLife.53498

eLife (May 2020)

Dataset decay and the problem of sequential analyses on open datasets

William Hedley Thompson,
Jessey Wright,
Patrick G Bissett,
Russell A Poldrack

Affiliations

William Hedley Thompson: ORCiD; Department of Psychology, Stanford University, Stanford, United States; Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
Jessey Wright: ORCiD; Department of Psychology, Stanford University, Stanford, United States; Department of Philosophy, Stanford University, Stanford, United States
Patrick G Bissett: Department of Psychology, Stanford University, Stanford, United States
Russell A Poldrack: ORCiD; Department of Psychology, Stanford University, Stanford, United States

DOI: https://doi.org/10.7554/eLife.53498
Journal volume & issue: Vol. 9

Abstract

Read online

Open data allows researchers to explore pre-existing datasets in new ways. However, if many researchers reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis testing on the same dataset by multiple researchers can inflate error rates. We go on to discuss a number of correction procedures that can reduce the number of false positives, and the challenges associated with these correction procedures.

Published in eLife

ISSN: 2050-084X (Online)
Publisher: eLife Sciences Publications Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science: Biology (General)
Website: https://elifesciences.org

About the journal

Abstract

Keywords