PLoS ONE (Jan 2020)

Performance of deep learning to detect mastoiditis using multiple conventional radiographs of mastoid.

  • Kyong Joon Lee,
  • Inseon Ryoo,
  • Dongjun Choi,
  • Leonard Sunwoo,
  • Sung-Hye You,
  • Hye Na Jung

DOI
https://doi.org/10.1371/journal.pone.0241796
Journal volume & issue
Vol. 15, no. 11
p. e0241796

Abstract

Read online

ObjectivesThis study aimed to compare the diagnostic performance of deep learning algorithm trained by single view (anterior-posterior (AP) or lateral view) with that trained by multiple views (both views together) in diagnosis of mastoiditis on mastoid series and compare the diagnostic performance between the algorithm and radiologists.MethodsTotal 9,988 mastoid series (AP and lateral views) were classified as normal or abnormal (mastoiditis) based on radiographic findings. Among them 792 image sets with temporal bone CT were classified as the gold standard test set and remaining sets were randomly divided into training (n = 8,276) and validation (n = 920) sets by 9:1 for developing a deep learning algorithm. Temporal (n = 294) and geographic (n = 308) external test sets were also collected. Diagnostic performance of deep learning algorithm trained by single view was compared with that trained by multiple views. Diagnostic performance of the algorithm and two radiologists was assessed. Inter-observer agreement between the algorithm and radiologists and between two radiologists was calculated.ResultsArea under the receiver operating characteristic curves of algorithm using multiple views (0.971, 0.978, and 0.965 for gold standard, temporal, and geographic external test sets, respectively) showed higher values than those using single view (0.964/0.953, 0.952/0.961, and 0.961/0.942 for AP view/lateral view of gold standard, temporal external, and geographic external test sets, respectively) in all test sets. The algorithm showed statistically significant higher specificity compared with radiologists (p = 0.018 and 0.012). There was substantial agreement between the algorithm and two radiologists and between two radiologists (κ = 0.79, 0.8, and 0.76).ConclusionThe deep learning algorithm trained by multiple views showed better performance than that trained by single view. The diagnostic performance of the algorithm for detecting mastoiditis on mastoid series was similar to or higher than that of radiologists.