On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Dennis Segebarth; Matthias Griebel; Nikolai Stein; Cora R von Collenberg; Corinna Martin; Dominik Fiedler; Lucas B Comeras; Anupam Sah; Victoria Schoeffler; Teresa Lüffe; Alexander Dürr; Rohini Gupta; Manju Sasi; Christina Lillesaar; Maren D Lange; Ramon O Tasan; Nicolas Singewald; Hans-Christian Pape; Christoph M Flath; Robert Blum

doi:10.7554/eLife.59780

eLife (Oct 2020)

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Dennis Segebarth,
Matthias Griebel,
Nikolai Stein,
Cora R von Collenberg,
Corinna Martin,
Dominik Fiedler,
Lucas B Comeras,
Anupam Sah,
Victoria Schoeffler,
Teresa Lüffe,
Alexander Dürr,
Rohini Gupta,
Manju Sasi,
Christina Lillesaar,
Maren D Lange,
Ramon O Tasan,
Nicolas Singewald,
Hans-Christian Pape,
Christoph M Flath,
Robert Blum

Affiliations

Dennis Segebarth: ORCiD; Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Matthias Griebel: ORCiD; Department of Business and Economics, University of Würzburg, Würzburg, Germany
Nikolai Stein: Department of Business and Economics, University of Würzburg, Würzburg, Germany
Cora R von Collenberg: Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Corinna Martin: Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Dominik Fiedler: Institute of Physiology I, Westfälische Wilhlems-Universität, Münster, Germany
Lucas B Comeras: ORCiD; Department of Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
Anupam Sah: ORCiD; Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck, Innsbruck, Austria
Victoria Schoeffler: Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg, Würzburg, Germany
Teresa Lüffe: Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg, Würzburg, Germany
Alexander Dürr: Department of Business and Economics, University of Würzburg, Würzburg, Germany
Rohini Gupta: Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Manju Sasi: Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Christina Lillesaar: ORCiD; Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg, Würzburg, Germany
Maren D Lange: Institute of Physiology I, Westfälische Wilhlems-Universität, Münster, Germany
Ramon O Tasan: Department of Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
Nicolas Singewald: ORCiD; Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck, Innsbruck, Austria
Hans-Christian Pape: ORCiD; Institute of Physiology I, Westfälische Wilhlems-Universität, Münster, Germany
Christoph M Flath: ORCiD; Department of Business and Economics, University of Würzburg, Würzburg, Germany
Robert Blum: ORCiD; Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany; Comprehensive Anxiety Center, Würzburg, Germany

DOI: https://doi.org/10.7554/eLife.59780
Journal volume & issue: Vol. 9

Abstract

Read online

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

Published in eLife

ISSN: 2050-084X (Online)
Publisher: eLife Sciences Publications Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science: Biology (General)
Website: https://elifesciences.org

About the journal

Abstract

Keywords