Alexandria Engineering Journal (May 2025)

MiM-UNet: An efficient building image segmentation network integrating state space models

  • Dong Liu,
  • Zhiyong Wang,
  • Ankai Liang

DOI
https://doi.org/10.1016/j.aej.2025.02.035
Journal volume & issue
Vol. 120
pp. 648 – 656

Abstract

Read online

With the advancement of remote sensing technology, the analysis of complex terrain images has become crucial for urban planning and geographic information extraction. However, existing models face significant challenges in processing intricate building structures: Transformer-based models suffer from high computational complexity and memory demands, while Convolutional Neural Networks (CNNs) often struggle to capture features across multiple scales and hierarchical levels. To address these limitations, we propose a novel architecture, Mamba-in-Mamba U-Net (MiM-UNet), which integrates the design principles of state–space models (SSMs) to enhance both computational efficiency and feature extraction capacity. Specifically, MiM-UNet refines the traditional encoder–decoder framework by introducing Mamba-in-Mamba blocks, enabling precise multi-scale feature capture and efficient information fusion. Experimental results demonstrate that MiM-UNet outperforms state-of-the-art models in segmentation accuracy on the Massachusetts building dataset, while substantially reducing computational overhead, highlighting its superior performance and promising potential for practical applications.

Keywords