Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Yice Cao; Chenchen Liu; Zhenhua Wu; Lei Zhang; Lixia Yang

doi:10.3390/rs17081390

Remote Sensing (Apr 2025)

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Yice Cao,
Chenchen Liu,
Zhenhua Wu,
Lei Zhang,
Lixia Yang

Affiliations

Yice Cao: School of Electronics and Information Engineering, Anhui University, Hefei 230601, China
Chenchen Liu: School of Electronics and Information Engineering, Anhui University, Hefei 230601, China
Zhenhua Wu: School of Electronics and Information Engineering, Anhui University, Hefei 230601, China
Lei Zhang: School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
Lixia Yang: School of Electronics and Information Engineering, Anhui University, Hefei 230601, China

DOI: https://doi.org/10.3390/rs17081390
Journal volume & issue: Vol. 17, no. 8
p. 1390

Abstract

Read online

Rapid advancements in remote sensing (RS) imaging technology have heightened the demand for the precise and efficient interpretation of large-scale, high-resolution RS images. Although segmentation algorithms based on convolutional neural networks (CNNs) or Transformers have achieved significant performance improvements, the trade-off between segmentation precision and computational complexity remains a key limitation for practical applications. Therefore, this paper proposes CVMH-UNet—a hybrid semantic segmentation network that integrates the Vision Mamba (VMamba) framework with multi-scale feature fusion—to achieve high-precision and relatively efficient RS image segmentation. CVMH-UNet comprises the following two core modules: the hybrid visual state space block (HVSSBlock) and the multi-frequency multi-scale feature fusion block (MFMSBlock). The HVSSBlock integrates convolutional branches to enhance local feature extraction while employing a cross 2D scanning method (CS2D) to capture global information from multiple directions, enabling the synergistic modeling of global and local features. The MFMSBlock introduces multi-frequency information via 2D Discrete Cosine Transform (2D DCT) and extracts multi-scale local details through point-wise convolution, thereby optimizing refined feature fusion in skip connections between the encoder and decoder. Experimental results on benchmark RS datasets demonstrate that CVMH-UNet achieves state-of-the-art segmentation accuracy with optimal computational efficiency, surpassing existing advanced methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords