Frontiers in Oncology (Feb 2023)

Deep learning for automatic head and neck lymph node level delineation provides expert-level accuracy

  • Thomas Weissmann,
  • Thomas Weissmann,
  • Yixing Huang,
  • Yixing Huang,
  • Stefan Fischer,
  • Stefan Fischer,
  • Johannes Roesch,
  • Johannes Roesch,
  • Sina Mansoorian,
  • Sina Mansoorian,
  • Horacio Ayala Gaona,
  • Horacio Ayala Gaona,
  • Antoniu-Oreste Gostian,
  • Antoniu-Oreste Gostian,
  • Markus Hecht,
  • Markus Hecht,
  • Sebastian Lettmaier,
  • Sebastian Lettmaier,
  • Lisa Deloch,
  • Lisa Deloch,
  • Lisa Deloch,
  • Benjamin Frey,
  • Benjamin Frey,
  • Benjamin Frey,
  • Udo S. Gaipl,
  • Udo S. Gaipl,
  • Udo S. Gaipl,
  • Luitpold Valentin Distel,
  • Luitpold Valentin Distel,
  • Andreas Maier,
  • Heinrich Iro,
  • Heinrich Iro,
  • Sabine Semrau,
  • Sabine Semrau,
  • Christoph Bert,
  • Christoph Bert,
  • Rainer Fietkau,
  • Rainer Fietkau,
  • Florian Putz,
  • Florian Putz

DOI
https://doi.org/10.3389/fonc.2023.1115258
Journal volume & issue
Vol. 13

Abstract

Read online

BackgroundDeep learning-based head and neck lymph node level (HN_LNL) autodelineation is of high relevance to radiotherapy research and clinical treatment planning but still underinvestigated in academic literature. In particular, there is no publicly available open-source solution for large-scale autosegmentation of HN_LNL in the research setting.MethodsAn expert-delineated cohort of 35 planning CTs was used for training of an nnU-net 3D-fullres/2D-ensemble model for autosegmentation of 20 different HN_LNL. A second cohort acquired at the same institution later in time served as the test set (n = 20). In a completely blinded evaluation, 3 clinical experts rated the quality of deep learning autosegmentations in a head-to-head comparison with expert-created contours. For a subgroup of 10 cases, intraobserver variability was compared to the average deep learning autosegmentation accuracy on the original and recontoured set of expert segmentations. A postprocessing step to adjust craniocaudal boundaries of level autosegmentations to the CT slice plane was introduced and the effect of autocontour consistency with CT slice plane orientation on geometric accuracy and expert rating was investigated.ResultsBlinded expert ratings for deep learning segmentations and expert-created contours were not significantly different. Deep learning segmentations with slice plane adjustment were rated numerically higher (mean, 81.0 vs. 79.6, p = 0.185) and deep learning segmentations without slice plane adjustment were rated numerically lower (77.2 vs. 79.6, p = 0.167) than manually drawn contours. In a head-to-head comparison, deep learning segmentations with CT slice plane adjustment were rated significantly better than deep learning contours without slice plane adjustment (81.0 vs. 77.2, p = 0.004). Geometric accuracy of deep learning segmentations was not different from intraobserver variability (mean Dice per level, 0.76 vs. 0.77, p = 0.307). Clinical significance of contour consistency with CT slice plane orientation was not represented by geometric accuracy metrics (volumetric Dice, 0.78 vs. 0.78, p = 0.703).ConclusionsWe show that a nnU-net 3D-fullres/2D-ensemble model can be used for highly accurate autodelineation of HN_LNL using only a limited training dataset that is ideally suited for large-scale standardized autodelineation of HN_LNL in the research setting. Geometric accuracy metrics are only an imperfect surrogate for blinded expert rating.

Keywords