BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification

Yangyang Wang; Wengang Zhang; Weidong Chen; Chang Chen

doi:10.3390/rs16071150

Remote Sensing (Mar 2024)

BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification

Yangyang Wang,
Wengang Zhang,
Weidong Chen,
Chang Chen

Affiliations

Yangyang Wang: School of Information Science and Technology, University of Science and Technology of China, Hefei 230037, China
Wengang Zhang: Electronic Countermeasure Institute, National University of Defense Technology, Hefei 230037, China
Weidong Chen: School of Information Science and Technology, University of Science and Technology of China, Hefei 230037, China
Chang Chen: School of Information Science and Technology, University of Science and Technology of China, Hefei 230037, China

DOI: https://doi.org/10.3390/rs16071150
Journal volume & issue: Vol. 16, no. 7
p. 1150

Abstract

Read online

Land cover classification using high-resolution Polarimetric Synthetic Aperture Radar (PolSAR) images obtained from satellites is a challenging task. While deep learning algorithms have been extensively studied for PolSAR image land cover classification, the performance is severely constrained due to the scarcity of labeled PolSAR samples and the limited domain acceptance of models. Recently, the emergence of the Segment Anything Model (SAM) based on the vision transformer (VIT) model has brought about a revolution in the study of specific downstream tasks in computer vision. Benefiting from its millions of parameters and extensive training datasets, SAM demonstrates powerful capabilities in extracting semantic information and generalization. To this end, we propose a dual-stream feature extraction network based on SAM, i.e., BSDSNet. We change the image encoder part of SAM to a dual stream, where the ConvNext image encoder is utilized to extract local information and the VIT image encoder is used to extract global information. BSDSNet achieves an in-depth exploration of semantic and spatial information in PolSAR images. Additionally, to facilitate a fine-grained amalgamation of information, the SA-Gate module is employed to integrate local–global information. Compared to previous deep learning models, BSDSNet’s impressive ability to represent features is akin to a versatile receptive field, making it well suited for classifying PolSAR images across various resolutions. Comprehensive evaluations indicate that BSDSNet achieves excellent results in qualitative and quantitative evaluation when performing classification tasks on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset. Compared to the suboptimal results, our method improves the Kappa metric by 3.68% and 0.44% on the AIR-PolSAR-Seg dataset and the WHU-OPT-SAR dataset, respectively.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords