MulA-nnUNet: A Multi-Attention Enhanced nnUNet Framework for 3D Abdominal Multi-Organs Segmentation

Jiashuo Ding; Wei Ni; Jiahui Wan; Xiaojun Deng; Lanjun Wan

doi:10.1109/ACCESS.2024.3437652

IEEE Access (Jan 2024)

MulA-nnUNet: A Multi-Attention Enhanced nnUNet Framework for 3D Abdominal Multi-Organs Segmentation

Jiashuo Ding,
Wei Ni,
Jiahui Wan,
Xiaojun Deng,
Lanjun Wan

Affiliations

Jiashuo Ding: ORCiD; School of Computer Science, Hunan University of Technology, Zhuzhou, China
Wei Ni: ORCiD; School of Computer Science, Hunan University of Technology, Zhuzhou, China
Jiahui Wan: College of Mechanical and Electrical Engineering, Hunan Agricultural University, Changsha, China
Xiaojun Deng: School of Computer Science, Hunan University of Technology, Zhuzhou, China
Lanjun Wan: ORCiD; School of Computer Science, Hunan University of Technology, Zhuzhou, China

DOI: https://doi.org/10.1109/ACCESS.2024.3437652
Journal volume & issue: Vol. 12
pp. 106658 – 106671

Abstract

Read online

In the domain of medical image segmentation, the nnUNet framework is highly respected for its excellent performance and wide range of applications. However, the inherent bias of locality and weight sharing introduced by the continuous convolutional operations currently used limits the network’s performance in modeling long-term dependencies. Furthermore, in the process of implementing residual links, certain limitations are encountered due to the substantial semantic discrepancy between the encoder’s output feature maps and the decoder’s. These limitations are seen in the direct application of skip connections for feature fusion and gradient propagation, which are known to impact the model’s convergence speed and overall performance. In this paper, a novel framework is presented, namely Multi-Attention nnUNet (MulA-nnUNet), which utilizes nnUNet as the foundational network structure and integrates two key attention mechanisms: large kernel convolutional attention (LKA) and pixel attention (PA). LKA is embedded within the deep encoder, maintaining the effectiveness of shallow feature extraction and enhancing the deep neural networks’ ability to understand long-range spatial dependencies. At the same time, the semantic distinction between the encoder and decoder’s output map of features is decreased by the PA module, which helps to improve the effect of skip connection feature fusion. The complexity of the model is reduced by replacing the standard convolutions in the encoder and decoder layers with depthwise separable convolutions (DS), which have fewer parameters. The effectiveness of the proposed framework is confirmed by a set of ablation experiments and comparison experiments with current state-of-the-art models on the computed tomography (CT) subset of the multimodal abdominal multi-organ segmentation dataset (AMOS), which includes 500 CT scans, with 350 scans for training, 75 for validation, and 75 for testing. MulA-nnUNet shows improvements of 1.1% in mean dice similarity coefficient (mDSC) and 1.52% in mean intersection over union (mIoU), while the baseline model requires 5 times the floating point operations (FLOPs) and over 7 times the parameters (Params). Additionally, it demonstrates superior accuracy in segmenting organs such as the liver, stomach, aorta, and pancreas, thereby enhancing the accuracy of 3D abdominal multi-organ image segmentation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords