Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology

Nikos Sourlos; Rozemarijn Vliegenthart; Joao Santinha; Michail E. Klontzas; Renato Cuocolo; Merel Huisman; Peter van Ooijen

doi:10.1186/s13244-024-01833-2

Insights into Imaging (Oct 2024)

Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology

Nikos Sourlos,
Rozemarijn Vliegenthart,
Joao Santinha,
Michail E. Klontzas,
Renato Cuocolo,
Merel Huisman,
Peter van Ooijen

Affiliations

Nikos Sourlos: Department of Radiology, University Medical Center of Groningen
Rozemarijn Vliegenthart: Department of Radiology, University Medical Center of Groningen
Joao Santinha: Digital Surgery LAB, Champalimaud Foundation, Champalimaud Clinical Centre
Michail E. Klontzas: Department of Medical Imaging, University Hospital of Heraklion
Renato Cuocolo: Department of Medicine, Surgery, and Dentistry, University of Salerno
Merel Huisman: Department of Radiology and Nuclear Medicine, Radboud University Medical Center
Peter van Ooijen: DataScience Center in Health, University Medical Center Groningen

DOI: https://doi.org/10.1186/s13244-024-01833-2
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Various healthcare domains have witnessed successful preliminary implementation of artificial intelligence (AI) solutions, including radiology, though limited generalizability hinders their widespread adoption. Currently, most research groups and industry have limited access to the data needed for external validation studies. The creation and accessibility of benchmark datasets to validate such solutions represents a critical step towards generalizability, for which an array of aspects ranging from preprocessing to regulatory issues and biostatistical principles come into play. In this article, the authors provide recommendations for the creation of benchmark datasets in radiology, explain current limitations in this realm, and explore potential new approaches. Clinical relevance statement Benchmark datasets, facilitating validation of AI software performance can contribute to the adoption of AI in clinical practice. Key Points Benchmark datasets are essential for the validation of AI software performance. Factors like image quality and representativeness of cases should be considered. Benchmark datasets can help adoption by increasing the trustworthiness and robustness of AI. Graphical Abstract

Published in Insights into Imaging

ISSN: 1869-4101 (Online)
Publisher: SpringerOpen
Country of publisher: Germany
LCC subjects: Medicine: Medicine (General): Medical physics. Medical radiology. Nuclear medicine
Website: http://www.springer.com/13244

About the journal

Abstract

Keywords