Radiation Oncology (Jan 2024)

Comparison of deep learning networks for fully automated head and neck tumor delineation on multi-centric PET/CT images

  • Yiling Wang,
  • Elia Lombardo,
  • Lili Huang,
  • Michele Avanzo,
  • Giuseppe Fanetti,
  • Giovanni Franchin,
  • Sebastian Zschaeck,
  • Julian Weingärtner,
  • Claus Belka,
  • Marco Riboldi,
  • Christopher Kurz,
  • Guillaume Landry

DOI
https://doi.org/10.1186/s13014-023-02388-0
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Objectives Deep learning-based auto-segmentation of head and neck cancer (HNC) tumors is expected to have better reproducibility than manual delineation. Positron emission tomography (PET) and computed tomography (CT) are commonly used in tumor segmentation. However, current methods still face challenges in handling whole-body scans where a manual selection of a bounding box may be required. Moreover, different institutions might still apply different guidelines for tumor delineation. This study aimed at exploring the auto-localization and segmentation of HNC tumors from entire PET/CT scans and investigating the transferability of trained baseline models to external real world cohorts. Methods We employed 2D Retina Unet to find HNC tumors from whole-body PET/CT and utilized a regular Unet to segment the union of the tumor and involved lymph nodes. In comparison, 2D/3D Retina Unets were also implemented to localize and segment the same target in an end-to-end manner. The segmentation performance was evaluated via Dice similarity coefficient (DSC) and Hausdorff distance 95th percentile (HD95). Delineated PET/CT scans from the HECKTOR challenge were used to train the baseline models by 5-fold cross-validation. Another 271 delineated PET/CTs from three different institutions (MAASTRO, CRO, BERLIN) were used for external testing. Finally, facility-specific transfer learning was applied to investigate the improvement of segmentation performance against baseline models. Results Encouraging localization results were observed, achieving a maximum omnidirectional tumor center difference lower than 6.8 cm for external testing. The three baseline models yielded similar averaged cross-validation (CV) results with a DSC in a range of 0.71–0.75, while the averaged CV HD95 was 8.6, 10.7 and 9.8 mm for the regular Unet, 2D and 3D Retina Unets, respectively. More than a 10% drop in DSC and a 40% increase in HD95 were observed if the baseline models were tested on the three external cohorts directly. After the facility-specific training, an improvement in external testing was observed for all models. The regular Unet had the best DSC (0.70) for the MAASTRO cohort, and the best HD95 (7.8 and 7.9 mm) in the MAASTRO and CRO cohorts. The 2D Retina Unet had the best DSC (0.76 and 0.67) for the CRO and BERLIN cohorts, and the best HD95 (12.4 mm) for the BERLIN cohort. Conclusion The regular Unet outperformed the other two baseline models in CV and most external testing cohorts. Facility-specific transfer learning can potentially improve HNC segmentation performance for individual institutions, where the 2D Retina Unets could achieve comparable or even better results than the regular Unet.

Keywords