Remote Sensing (Apr 2025)
Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion
Abstract
Rapid advancements in remote sensing (RS) imaging technology have heightened the demand for the precise and efficient interpretation of large-scale, high-resolution RS images. Although segmentation algorithms based on convolutional neural networks (CNNs) or Transformers have achieved significant performance improvements, the trade-off between segmentation precision and computational complexity remains a key limitation for practical applications. Therefore, this paper proposes CVMH-UNet—a hybrid semantic segmentation network that integrates the Vision Mamba (VMamba) framework with multi-scale feature fusion—to achieve high-precision and relatively efficient RS image segmentation. CVMH-UNet comprises the following two core modules: the hybrid visual state space block (HVSSBlock) and the multi-frequency multi-scale feature fusion block (MFMSBlock). The HVSSBlock integrates convolutional branches to enhance local feature extraction while employing a cross 2D scanning method (CS2D) to capture global information from multiple directions, enabling the synergistic modeling of global and local features. The MFMSBlock introduces multi-frequency information via 2D Discrete Cosine Transform (2D DCT) and extracts multi-scale local details through point-wise convolution, thereby optimizing refined feature fusion in skip connections between the encoder and decoder. Experimental results on benchmark RS datasets demonstrate that CVMH-UNet achieves state-of-the-art segmentation accuracy with optimal computational efficiency, surpassing existing advanced methods.
Keywords