Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy

Julius Husarek; Silvan Hess; Sam Razaeian; Thomas D. Ruder; Stephan Sehmisch; Martin Müller; Emmanouil Liodakis

doi:10.1038/s41598-024-73058-8

Scientific Reports (Oct 2024)

Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy

Julius Husarek,
Silvan Hess,
Sam Razaeian,
Thomas D. Ruder,
Stephan Sehmisch,
Martin Müller,
Emmanouil Liodakis

Affiliations

Julius Husarek: Department of Orthopaedic Surgery and Traumatology, Bern University Hospital, Inselspital, University of Bern
Silvan Hess: Department of Orthopaedic Surgery and Traumatology, Bern University Hospital, Inselspital, University of Bern
Sam Razaeian: Department for Trauma, Hand and Reconstructive Surgery, Saarland University
Thomas D. Ruder: Interventional and Pediatric Radiology, Inselspital, Bern University Hospital, University Institute of Diagnostic, University of Bern
Stephan Sehmisch: Department of Trauma Surgery, Hannover Medical School
Martin Müller: Department of Emergency Medicine, Bern University Hospital, Inselspital, University of Bern
Emmanouil Liodakis: Department for Trauma, Hand and Reconstructive Surgery, Saarland University

DOI: https://doi.org/10.1038/s41598-024-73058-8
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies (n = 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80–90%) and one with very poor sensitivity ( 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions (n = 7) apart from ribs (n = 4; poor sensitivity / moderate specificity) and spine (n = 4; excellent sensitivity / poor specificity). Funded studies (n = 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies (n = 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings (p = 0.316) but specificity was significantly higher the latter group (p < 0.001). Sensitivity was significant lower in human unaided compared to human AI aided respectively stand-alone AI ratings (both p ≤ 0.001); specificity was higher in human unaided ratings compared to stand-alone AI (p < 0.001) and showed no significant differences AI aided ratings (p = 0.316). The study demonstrates good diagnostic accuracy across most CAAI-FDS and anatomical regions, with the highest performance achieved when used in conjunction with human assessment. Diagnostic accuracy appears lower for spine and rib fractures. The impact of industry funding on reported performance is small.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal