EJNMMI Physics (Mar 2023)

Does consensus contours improve robustness and accuracy on $$^{18}$$ 18 F-FDG PET imaging tumor delineation?

  • Mingzan Zhuang,
  • Zhifen Qiu,
  • Yunlong Lou

DOI
https://doi.org/10.1186/s40658-023-00538-7
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Purpose: The aim of this study is to explore the robustness and accuracy of consensus contours with 225 nasopharyngeal carcinoma (NPC) clinical cases and 13 extended cardio-torso simulated lung tumors (XCAT) based on 2-deoxy-2-[ $$^{18}$$ 18 F]fluoro-D-glucose ( $$^{18}$$ 18 F-FDG) PET imaging. Methods: Primary tumor segmentation was performed with two different initial masks on 225 NPC $$^{18}$$ 18 F-FDG PET datasets and 13 XCAT simulations using methods of automatic segmentation with active contour, affinity propagation (AP), contrast-oriented thresholding (ST), and 41% maximum tumor value (41MAX), respectively. Consensus contours (ConSeg) were subsequently generated based on the majority vote rule. The metabolically active tumor volume (MATV), relative volume error (RE), Dice similarity coefficient (DSC) and their respective test–retest (TRT) metrics between different masks were adopted to analyze the results quantitatively. The nonparametric Friedman and post hoc Wilcoxon tests with Bonferroni adjustment for multiple comparisons were performed with $$P<$$ P < 0.05 considered to be significant. Results: AP presented the highest variability for MATV in different masks, and ConSeg presented much better TRT performances in MATV compared with AP, and slightly poorer TRT in MATV compared with ST or 41MAXin most cases. Similar trends were also found in RE and DSC with the simulated data. The average of four segmentation results (AveSeg) showed better or comparable results in accuracy for most cases with respect to ConSeg. AP, AveSeg and ConSeg presented better RE and DSC in irregular masks as compared with rectangle masks. Additionally, all methods underestimated the tumour boundaries in relation to the ground truth for XCAT including respiratory motion. Conclusions: The consensus method could be a robust approach to alleviate segmentation variabilities, but did not seem to improve the accuracy of segmentation results on average. Irregular initial masks might be at least in some cases attributable to mitigate the segmentation variability as well.

Keywords