JOR Spine (Sep 2023)

Deep learning‐based detection and classification of lumbar disc herniation on magnetic resonance images

  • Weicong Zhang,
  • Ziyang Chen,
  • Zhihai Su,
  • Zhengyan Wang,
  • Jinjin Hai,
  • Chengjie Huang,
  • Yuhan Wang,
  • Bin Yan,
  • Hai Lu

DOI
https://doi.org/10.1002/jsp2.1276
Journal volume & issue
Vol. 6, no. 3
pp. n/a – n/a

Abstract

Read online

Abstract Background The severity assessment of lumbar disc herniation (LDH) on MR images is crucial for selecting suitable surgical candidates. However, the interpretation of MR images is time‐consuming and requires repetitive work. This study aims to develop and evaluate a deep learning‐based diagnostic model for automated LDH detection and classification on lumbar axial T2‐weighted MR images. Methods A total of 1115 patients were analyzed in this retrospective study; both a development dataset (1015 patients, 15 249 images) and an external test dataset (100 patients, 1273 images) were utilized. According to the Michigan State University (MSU) classification criterion, experts labeled all images with consensus, and the final labeled results were regarded as the reference standard. The automated diagnostic model comprised Faster R‐CNN and ResNeXt101 as the detection and classification network, respectively. The deep learning‐based diagnostic performance was evaluated by calculating mean intersection over union (IoU), accuracy, precision, sensitivity, specificity, F1 score, the area under the receiver operating characteristics curve (AUC), and intraclass correlation coefficient (ICC) with 95% confidence intervals (CIs). Results High detection consistency was obtained in the internal test dataset (mean IoU = 0.82, precision = 98.4%, sensitivity = 99.4%) and external test dataset (mean IoU = 0.70, precision = 96.3%, sensitivity = 97.8%). Overall accuracy for LDH classification was 87.70% (95% CI: 86.59%–88.86%) and 74.23% (95% CI: 71.83%–76.75%) in the internal and external test datasets, respectively. For internal testing, the proposed model achieved a high agreement in classification (ICC = 0.87, 95% CI: 0.86–0.88, P < 0.001), which was higher than that of external testing (ICC = 0.79, 95% CI: 0.76–0.81, P < 0.001). The AUC for model classification was 0.965 (95% CI: 0.962–0.968) and 0.916 (95% CI: 0.908–0.925) in the internal and external test datasets, respectively. Conclusions The automated diagnostic model achieved high performance in detecting and classifying LDH and exhibited considerable consistency with experts' classification.

Keywords