PolSAR Image Classification Via a Multigranularity Hybrid CNN-ViT Model With External Tokens and Cross-Attention

Wenke Wang; Jianlong Wang; Dou Quan; Meijuan Yang; Junding Sun; Bibo Lu

doi:10.1109/JSTARS.2024.3384420

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

PolSAR Image Classification Via a Multigranularity Hybrid CNN-ViT Model With External Tokens and Cross-Attention

Wenke Wang,
Jianlong Wang,
Dou Quan,
Meijuan Yang,
Junding Sun,
Bibo Lu

Affiliations

Wenke Wang: ORCiD; School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
Jianlong Wang: ORCiD; School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
Dou Quan: ORCiD; Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, School of Artificial Intelligence, Xidian University, Xi'an, China
Meijuan Yang: ORCiD; School of Artificial Intelligence OPtics and ElectroNics, Northwestern Polytechnical University, Xi'an, China
Junding Sun: ORCiD; School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
Bibo Lu: School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China

DOI: https://doi.org/10.1109/JSTARS.2024.3384420
Journal volume & issue: Vol. 17
pp. 8003 – 8019

Abstract

Read online

With the development of deep learning technology, the application of convolutional neural network (CNN) and vision transformer (ViT) for polarimetric synthetic aperture radar (PolSAR) image classification has been deepened. However, the PolSAR image has very rich information due to its special data form, which makes it difficult for the existing single network structure to comprehensively extract such effective information. In addition, deep learning methods require a large amount of data for training, whereas PolSAR labeled data is scarce and difficult to obtain. Therefore, a multigranularity hybrid CNN-ViT model based on external tokens and cross-attention is proposed for PolSAR image classification. First of all, CNN is able to learn local features very well. Thus, a CNN-based external feature extractor is designed to extract local features from the PolSAR image. Then, ViT can focus on global features. So, a multigranularity attention structure is constructed for extracting global information at multiple scales. With these two modules, the model can fully access the feature information contained in PolSAR images, which is more advantageous than a single network structure. Next, to further utilize these features, a cross-attention feature fusion module is built for fusing global–local information of different granularities. Finally, by connecting with the softmax classifier, the network outputs the final prediction results. Experimental results on three benchmark datasets show that the present method using a small amount of labeled data for training also achieves the highest level of classification among the compared methods.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords