IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Local–Global Multiscale Fusion Network for Semantic Segmentation of Buildings in SAR Imagery
Abstract
The extraction of buildings from synthetic aperture radar (SAR) images poses a challenging task in the realm of remote sensing (RS). In recent years, convolutional neural networks (CNNs) have rapidly advanced and found application in the field of RS. Researchers have investigated the potential of CNNs for the semantic segmentation of SAR images, bringing excellent improvements. However, the semantic segmentation of buildings in SAR images still encounters challenges due to the high similarity between features of ground objects and buildings in SAR images, as well as the variability in building structures. In this article, we propose the local–global multiscale fusion network (LGMFNet), based on a dual encoder–decoder structure, for the semantic segmentation of buildings in SAR images. The proposed LGMFNet introduces an auxiliary encoder with a transformer structure to address the limitation of using the main encoder with a CNN structure for global modeling. To embed global dependencies hierarchically into the CNN, we designed the global–local semantic aggregation module (GLSM). The GLSM serves as a bridge between the dual encoders to achieve semantic guidance and coupling from the local to the global level. Furthermore, to bridge the semantic gap between different scales, we designed the multiscale feature fusion network (MSFN) as the decoder. MSFN achieves the interactive fusion of semantic information between various scales by constructing the multiscale feature fusion module. Experimental results demonstrate that the proposed LGMFNet achieves the mIoU of 91.17% on the BIGSARDATA 2023 AISAR competition dataset, outperforming the second-best method by a margin of 0.78%. This evidences the superiority of LGMFNet in comparison to other state-of-the-art methods.
Keywords