International Journal of Applied Earth Observations and Geoinformation (Jun 2024)
CaSaFormer: A cross- and self-attention based lightweight network for large-scale building semantic segmentation
Abstract
Buildings play a crucial role in geographic information systems, and advancements in the resolution of remote sensing imagery have facilitated their extraction on a larger scale. However, this progress has simultaneously heightened the requirements for methods to demonstrate efficiency and enhanced generalization performance. For this purpose, we propose a lightweight building semantic segmentation network, named CaSaFormer. Specifically, we propose an efficient module composed of Cross-attention and Self-attention Blocks connected in series (CaSa Block), to extract valuable semantic information from the feature pyramid. Furthermore, a novel Cross-Attention Gate Fusion (CAGF) module was developed to effectively integrate complementary components from global semantic features and local spatial features. Experiment results have demonstrated that our CaSaFormer outperforms state-of-the-art (SOTA) lightweight methods with best trade-off between accuracy and efficiency, showing a 1.92 % improvement in IoU and 16 % of the computation complexity. When compared to non-lightweight methods under equivalent computational resources, an impressive 1.69 % IoU gain is also achieved with only 1.7 % of the computation complexity. The code is available at: https://github.com/YpingHu/CaSaFormer.