Remote Sensing (Apr 2025)

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

  • Yice Cao,
  • Chenchen Liu,
  • Zhenhua Wu,
  • Lei Zhang,
  • Lixia Yang

DOI
https://doi.org/10.3390/rs17081390
Journal volume & issue
Vol. 17, no. 8
p. 1390

Abstract

Read online

Rapid advancements in remote sensing (RS) imaging technology have heightened the demand for the precise and efficient interpretation of large-scale, high-resolution RS images. Although segmentation algorithms based on convolutional neural networks (CNNs) or Transformers have achieved significant performance improvements, the trade-off between segmentation precision and computational complexity remains a key limitation for practical applications. Therefore, this paper proposes CVMH-UNet—a hybrid semantic segmentation network that integrates the Vision Mamba (VMamba) framework with multi-scale feature fusion—to achieve high-precision and relatively efficient RS image segmentation. CVMH-UNet comprises the following two core modules: the hybrid visual state space block (HVSSBlock) and the multi-frequency multi-scale feature fusion block (MFMSBlock). The HVSSBlock integrates convolutional branches to enhance local feature extraction while employing a cross 2D scanning method (CS2D) to capture global information from multiple directions, enabling the synergistic modeling of global and local features. The MFMSBlock introduces multi-frequency information via 2D Discrete Cosine Transform (2D DCT) and extracts multi-scale local details through point-wise convolution, thereby optimizing refined feature fusion in skip connections between the encoder and decoder. Experimental results on benchmark RS datasets demonstrate that CVMH-UNet achieves state-of-the-art segmentation accuracy with optimal computational efficiency, surpassing existing advanced methods.

Keywords