Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck, Innsbruck, Austria
Victoria Schoeffler
Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg, Würzburg, Germany
Teresa Lüffe
Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg, Würzburg, Germany
Alexander Dürr
Department of Business and Economics, University of Würzburg, Würzburg, Germany
Rohini Gupta
Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Manju Sasi
Institute of Clinical Neurobiology, University Hospital Würzburg, Würzburg, Germany
Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck, Innsbruck, Austria
Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.