Remote Sensing (Aug 2024)
A Lightweight Pyramid Transformer for High-Resolution SAR Image-Based Building Classification in Port Regions
Abstract
Automatic classification of buildings within port areas from synthetic aperture radar (SAR) images is crucial for effective port monitoring and planning. Yet, the unique challenges of SAR imaging, such as side-looking geometry, multi-bouncing scattering, and the compact arrangement of structures, often lead to incomplete building structures and blurred boundaries in classification results. To address these issues, this paper introduces SPformer, an efficient and lightweight pyramid transformer model tailored for semantic segmentation. The SPformer utilizes a pyramid transformer encoder with spatially separable self-attention (SSSA) to refine both local and global spatial information and to process multi-scale features, enhancing the accuracy of building structure delineation. It also integrates a lightweight all multi-layer perceptron (ALL-MLP) decoder to consolidate multi-scale information across various depths and attention scopes, refining detail processing. Experimental results on the Gaofen-3 (GF-3) 1 m port building classification dataset demonstrate the effectiveness of SPformer, achieving competitive performance compared to state-of-the-art models, with mean intersection over union (mIoU) and mean F1-score (mF1) reaching 77.14% and 87.04%, respectively, while maintaining a compact model size and lower computational requirements. Experiments conducted on the entire scene of SAR images covering port area also show the good capabilities of the proposed method.
Keywords