IEEE Access (Jan 2024)
DUSFormer: Dual-Swin Transformer V2 Aggregate Network for Polyp Segmentation
Abstract
The convolutional neural network method has certain limitations in medical image segmentation. As a result of the limited availability of polyp datasets, the model framework is vulnerable to instability and overfitting during training. Beyond that, ambiguous target boundaries can make segmentation more difficult. We propose a Dual-Swin Transformer V2 Aggregate Network called DUSFormer in order to address these issues, which can be used to more accurately capture spatial semantic features with different complexities. Specifically, the DUSFormer network consists of two encoders and decoders for progressive feature extraction and deep feature extraction. The decoder uses the Stepwise Feature Fusion (SFF) module to locally emphasize and fuse the feature maps at various levels. This architecture enables faster and more efficient dissemination of feature information at all levels, enabling faster integration of features with global dependencies and local details. In addition, an Adaptive Correction Module (ACM) is introduced to construct an aggregation relationship of edge information between layers of the encoder and the decoder, which can correct the predictive segmentation with irregular and blurred boundaries, and increase the precision of segmentation. The DUSFormer model has many advantages in terms quantitative and generalization ability on three polyp image segmentation datasets.
Keywords