Scientific Reports (Oct 2024)

SwinUNeCCt: bidirectional hash-based agent transformer for cervical cancer MRI image multi-task learning

  • Chongshuang Yang,
  • Zhuoyi Tan,
  • YiJie Wang,
  • Ran Bi,
  • Tianliang Shi,
  • Jing Yang,
  • Chao Huang,
  • Peng Jiang,
  • Xiangyang Fu

DOI
https://doi.org/10.1038/s41598-024-75544-5
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Cervical cancer is the fourth most common malignant tumor among women globally, posing a significant threat to women’s health. In 2022, approximately 600,000 new cases were reported, and 340,000 deaths occurred due to cervical cancer. Magnetic resonance imaging (MRI) is the preferred imaging method for diagnosing, staging, and evaluating cervical cancer. However, manual segmentation of MRI images is time-consuming and subjective. Therefore, there is an urgent need for automatic segmentation models to identify cervical cancer lesions in MRI scans accurately. All MRIs in our research are from cervical cancer patients diagnosed by pathology at Tongren City People’s Hospital. Strict data selection criteria and clearly defined inclusion and exclusion conditions were established to ensure data consistency and accuracy of research results. The dataset contains imaging data from 122 cervical cancer patients, with each patient having 100 pelvic dynamic contrast-enhanced MRI scans. Annotations were jointly completed by medical professionals from Universiti Putra Malaysia and the Radiology Department of Tongren City People’s Hospital to ensure data accuracy and reliability. Additionally, a novel computer-aided diagnosis model named SwinUNeCCt is proposed. This model incorporates (i) A bidirectional hash-based agent multi-head self-attention mechanism, which optimizes the interaction between local and global features in MRI, aiding in more accurate lesion identification. (ii) Reduced computational complexity of the self-attention mechanism. The effectiveness of the SwinUNeCCt model has been validated through comparisons with state-of-the-art 3D medical models, including nnUnet, TransBTS, nnFormer, UnetR, UnesT, SwinUNetR, and SwinUNeLCsT. In semantic segmentation tasks without a classification module, the SwinUNeCCt model demonstrates excellent performance across multiple key metrics: achieving a 95HD of 6.25, an IoU of 0.669, and a DSC of 0.802, all of which are the best results among the compared models. Simultaneously, SwinUNeCCt strikes a good balance between computational efficiency and model complexity, requiring only 442.7 GFLOPs of computational power and 71.2 M parameters. Furthermore, in semantic segmentation tasks that include a classification module, the SwinUNeCCt model also exhibits powerful recognition capabilities. Although this slightly increases computational overhead and model complexity, its performance surpasses other comparative models. The SwinUNeCCt model demonstrates excellent performance in semantic segmentation tasks, achieving the best results among state-of-the-art 3D medical models across multiple key metrics. It balances computational efficiency and model complexity well, maintaining high performance even with the inclusion of a classification module.

Keywords