An Attention Cascade Global–Local Network for Remote Sensing Scene Classification

Junge Shen; Tianwei Yu; Haopeng Yang; Ruxin Wang; Qi Wang

doi:10.3390/rs14092042

Remote Sensing (Apr 2022)

An Attention Cascade Global–Local Network for Remote Sensing Scene Classification

Junge Shen,
Tianwei Yu,
Haopeng Yang,
Ruxin Wang,
Qi Wang

Affiliations

Junge Shen: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China
Tianwei Yu: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China
Haopeng Yang: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China
Ruxin Wang: Engineering Research Center of Cyberspace, School of Software, Yunnan University, Kunming 650106, China
Qi Wang: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

DOI: https://doi.org/10.3390/rs14092042
Journal volume & issue: Vol. 14, no. 9
p. 2042

Abstract

Read online

Remote sensing image scene classification is an important task of remote sensing image interpretation, which has recently been well addressed by the convolutional neural network owing to its powerful learning ability. However, due to the multiple types of geographical information and redundant background information of the remote sensing images, most of the CNN-based methods, especially those based on a single CNN model and those ignoring the combination of global and local features, exhibit limited performance on accurate classification. To compensate for such insufficiency, we propose a new dual-model deep feature fusion method based on an attention cascade global–local network (ACGLNet). Specifically, we use two popular CNNs as the feature extractors to extract complementary multiscale features from the input image. Considering the characteristics of the global and local features, the proposed ACGLNet filters the redundant background information from the low-level features through the spatial attention mechanism, followed by which the locally attended features are fused with the high-level features. Then, bilinear fusion is employed to produce the fused representation of the dual model, which is finally fed to the classifier. Through extensive experiments on four public remote sensing scene datasets, including UCM, AID, PatternNet, and OPTIMAL-31, we demonstrate the feasibility of the proposed method and its superiority over the state-of-the-art scene classification methods.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords