Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Kyle Ellrott; Alex Buchanan; Allison Creason; Michael Mason; Thomas Schaffter; Bruce Hoff; James Eddy; John M. Chilton; Thomas Yu; Joshua M. Stuart; Julio Saez-Rodriguez; Gustavo Stolovitzky; Paul C. Boutros; Justin Guinney

doi:10.1186/s13059-019-1794-0

Genome Biology (Sep 2019)

Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Kyle Ellrott,
Alex Buchanan,
Allison Creason,
Michael Mason,
Thomas Schaffter,
Bruce Hoff,
James Eddy,
John M. Chilton,
Thomas Yu,
Joshua M. Stuart,
Julio Saez-Rodriguez,
Gustavo Stolovitzky,
Paul C. Boutros,
Justin Guinney

Affiliations

Kyle Ellrott: Biomedical Engineering, Oregon Health and Science University
Alex Buchanan: Biomedical Engineering, Oregon Health and Science University
Allison Creason: Biomedical Engineering, Oregon Health and Science University
Michael Mason: Sage Bionetworks
Thomas Schaffter: IBM Research
Bruce Hoff: Sage Bionetworks
James Eddy: Sage Bionetworks
John M. Chilton: Department of Biochemistry and Molecular Biology, The Pennsylvania State University
Thomas Yu: Sage Bionetworks
Joshua M. Stuart: University of California, Santa Cruz
Julio Saez-Rodriguez: Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Bioquant
Gustavo Stolovitzky: IBM Research
Paul C. Boutros: Ontario Institute for Cancer Research
Justin Guinney: Sage Bionetworks

DOI: https://doi.org/10.1186/s13059-019-1794-0
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal