Generalizability of deep learning models for dental image analysis

Joachim Krois; Anselmo Garcia Cantu; Akhilanand Chaurasia; Ranjitkumar Patil; Prabhat Kumar Chaudhari; Robert Gaudin; Sascha Gehrung; Falk Schwendicke

doi:10.1038/s41598-021-85454-5

Scientific Reports (Mar 2021)

Generalizability of deep learning models for dental image analysis

Joachim Krois,
Anselmo Garcia Cantu,
Akhilanand Chaurasia,
Ranjitkumar Patil,
Prabhat Kumar Chaudhari,
Robert Gaudin,
Sascha Gehrung,
Falk Schwendicke

Affiliations

Joachim Krois: Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin
Anselmo Garcia Cantu: Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin
Akhilanand Chaurasia: Department of Oral Medicine and Radiology, King George’s Medical University
Ranjitkumar Patil: Department of Oral Medicine and Radiology, King George’s Medical University
Prabhat Kumar Chaudhari: Division of Orthodontics and Dentofacial Deformities, AIIMS
Robert Gaudin: Department of Oral and Maxillofacial Surgery, Charité - Universitätsmedizin Berlin
Sascha Gehrung: Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin
Falk Schwendicke: Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin

DOI: https://doi.org/10.1038/s41598-021-85454-5
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 7

Abstract

Read online

Abstract We assessed the generalizability of deep learning models and how to improve it. Our exemplary use-case was the detection of apical lesions on panoramic radiographs. We employed two datasets of panoramic radiographs from two centers, one in Germany (Charité, Berlin, n = 650) and one in India (KGMU, Lucknow, n = 650): First, U-Net type models were trained on images from Charité (n = 500) and assessed on test sets from Charité and KGMU (each n = 150). Second, the relevance of image characteristics was explored using pixel-value transformations, aligning the image characteristics in the datasets. Third, cross-center training effects on generalizability were evaluated by stepwise replacing Charite with KGMU images. Last, we assessed the impact of the dental status (presence of root-canal fillings or restorations). Models trained only on Charité images showed a (mean ± SD) F1-score of 54.1 ± 0.8% on Charité and 32.7 ± 0.8% on KGMU data (p < 0.001/t-test). Alignment of image data characteristics between the centers did not improve generalizability. However, by gradually increasing the fraction of KGMU images in the training set (from 0 to 100%) the F1-score on KGMU images improved (46.1 ± 0.9%) at a moderate decrease on Charité images (50.9 ± 0.9%, p < 0.01). Model performance was good on KGMU images showing root-canal fillings and/or restorations, but much lower on KGMU images without root-canal fillings and/or restorations. Our deep learning models were not generalizable across centers. Cross-center training improved generalizability. Noteworthy, the dental status, but not image characteristics were relevant. Understanding the reasons behind limits in generalizability helps to mitigate generalizability problems.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal