IEEE Access (Jan 2022)

Focal Cosine Metric and Adaptive Attention Module for Remote Sensing Scene Classification With Siamese Convolutional Neural Networks

  • Lei Min,
  • Kun Gao,
  • Hong Wang,
  • Yutong Liu,
  • Zhenzhou Zhang,
  • Zibo Hu,
  • Xiaodian Zhang

DOI
https://doi.org/10.1109/ACCESS.2022.3188652
Journal volume & issue
Vol. 10
pp. 84212 – 84226

Abstract

Read online

Convolutional neural networks (CNNs) have been widely used in remote sensing (RS) scene classification tasks due to their remarkable feature representation and inference capability. The complexity of RS images not only brings the challenges of high inter-class similarity and large intra-class diversity, but also introduces the problem that category-relevant regions are insufficiently prominent in feature extraction. Siamese CNNs with feature similarity measurement are chosen in some applications to overcome the former issue, but most ignore the randomness of input sample pairs. This makes the Siamese CNNs not focus enough on challenging samples, which limits the training efficiency. We propose the focal cosine metric (FCM) block that combines the cosine similarity metric and the threshold control to achieve sample selection, thereby completing network learning more efficiently. FCM only permits the misclassified focal samples to participate in similarity measurement based on Siamese CNN. It flexibly mitigates the misclassification caused by the high inter-class similarity and large intra-class diversity. Moreover, the adaptive attention (AA) module is designed to stress the pivotal target regions and assist in the similarity measurement of Siamese CNN. This is realized by adaptively assigning high weights to key targets with learnable guided vectors. It enables the model to focus on the details of intra-class similarities or inter-class differences in sample pairs, and thus reduces the difficulty of model optimization. Encouraging experimental results on three public data sets demonstrate the effectiveness of the novel Siamese CNN-based method with FCM and AA and show its superiority compared to other state-of-the-art scene classification methods.

Keywords