Physics and Imaging in Radiation Oncology (Apr 2024)

Automatic gross tumor volume segmentation with failure detection for safe implementation in locally advanced cervical cancer

  • Rahimeh Rouhi,
  • Stéphane Niyoteka,
  • Alexandre Carré,
  • Samir Achkar,
  • Pierre-Antoine Laurent,
  • Mouhamadou Bachir Ba,
  • Cristina Veres,
  • Théophraste Henry,
  • Maria Vakalopoulou,
  • Roger Sun,
  • Sophie Espenel,
  • Linda Mrissa,
  • Adrien Laville,
  • Cyrus Chargari,
  • Eric Deutsch,
  • Charlotte Robert

Journal volume & issue
Vol. 30
p. 100578

Abstract

Read online

Background and Purpose: Automatic segmentation methods have greatly changed the RadioTherapy (RT) workflow, but still need to be extended to target volumes. In this paper, Deep Learning (DL) models were compared for Gross Tumor Volume (GTV) segmentation in locally advanced cervical cancer, and a novel investigation into failure detection was introduced by utilizing radiomic features. Methods and materials: We trained eight DL models (UNet, VNet, SegResNet, SegResNetVAE) for 2D and 3D segmentation. Ensembling individually trained models during cross-validation generated the final segmentation. To detect failures, binary classifiers were trained using radiomic features extracted from segmented GTVs as inputs, aiming to classify contours based on whether their Dice Similarity Coefficient (DSC)<T and DSC⩾T. Two distinct cohorts of T2-Weighted (T2W) pre-RT MR images captured in 2D sequences were used: one retrospective cohort consisting of 115 LACC patients from 30 scanners, and the other prospective cohort, comprising 51 patients from 7 scanners, used for testing. Results: Segmentation by 2D-SegResNet achieved the best DSC, Surface DSC (SDSC3mm), and 95th Hausdorff Distance (95HD): DSC = 0.72 ± 0.16, SDSC3mm=0.66 ± 0.17, and 95HD = 14.6 ± 9.0 mm without missing segmentation (M=0) on the test cohort. Failure detection could generate precision (P=0.88), recall (R=0.75), F1-score (F=0.81), and accuracy (A=0.86) using Logistic Regression (LR) classifier on the test cohort with a threshold T = 0.67 on DSC values. Conclusions: Our study revealed that segmentation accuracy varies slightly among different DL methods, with 2D networks outperforming 3D networks in 2D MRI sequences. Doctors found the time-saving aspect advantageous. The proposed failure detection could guide doctors in sensitive cases.

Keywords