Scientific Reports (Aug 2024)

Automatic detection and visualization of temporomandibular joint effusion with deep neural network

  • Yeon-Hee Lee,
  • Seonggwang Jeon,
  • Jong-Hyun Won,
  • Q.-Schick Auh,
  • Yung-Kyun Noh

DOI
https://doi.org/10.1038/s41598-024-69848-9
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract This study investigated the usefulness of deep learning-based automatic detection of temporomandibular joint (TMJ) effusion using magnetic resonance imaging (MRI) in patients with temporomandibular disorder and whether the diagnostic accuracy of the model improved when patients’ clinical information was provided in addition to MRI images. The sagittal MR images of 2948 TMJs were collected from 1017 women and 457 men (mean age 37.19 ± 18.64 years). The TMJ effusion diagnostic performances of three convolutional neural networks (scratch, fine-tuning, and freeze schemes) were compared with those of human experts based on areas under the curve (AUCs) and diagnosis accuracies. The fine-tuning model with proton density (PD) images showed acceptable prediction performance (AUC = 0.7895), and the from-scratch (0.6193) and freeze (0.6149) models showed lower performances (p < 0.05). The fine-tuning model had excellent specificity compared to the human experts (87.25% vs. 58.17%). However, the human experts were superior in sensitivity (80.00% vs. 57.43%) (all p < 0.001). In gradient-weighted class activation mapping (Grad-CAM) visualizations, the fine-tuning scheme focused more on effusion than on other structures of the TMJ, and the sparsity was higher than that of the from-scratch scheme (82.40% vs. 49.83%, p < 0.05). The Grad-CAM visualizations agreed with the model learned through important features in the TMJ area, particularly around the articular disc. Two fine-tuning models on PD and T2-weighted images showed that the diagnostic performance did not improve compared with using PD alone (p < 0.05). Diverse AUCs were observed across each group when the patients were divided according to age (0.7083–0.8375) and sex (male:0.7576, female:0.7083). The prediction accuracy of the ensemble model was higher than that of the human experts when all the data were used (74.21% vs. 67.71%, p < 0.05). A deep neural network (DNN) was developed to process multimodal data, including MRI and patient clinical data. Analysis of four age groups with the DNN model showed that the 41–60 age group had the best performance (AUC = 0.8258). The fine-tuning model and DNN were optimal for judging TMJ effusion and may be used to prevent true negative cases and aid in human diagnostic performance. Assistive automated diagnostic methods have the potential to increase clinicians’ diagnostic accuracy.

Keywords