IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2022)

Transformer-Driven Semantic Relation Inference for Multilabel Classification of High-Resolution Remote Sensing Images

  • Xiaowei Tan,
  • Zhifeng Xiao,
  • Jianjun Zhu,
  • Qiao Wan,
  • Kai Wang,
  • Deren Li

DOI
https://doi.org/10.1109/JSTARS.2022.3145042
Journal volume & issue
Vol. 15
pp. 1884 – 1901

Abstract

Read online

It is hard to use a single label to describe an image for the complexity of remote sensing scenes. Thus, it is a more general and practical choice to use multilabel image classification for high-resolution remote sensing (HRS) images. How to construct the relation between categories is a vital problem for multilabel classification. Some researchers use the recurrent neural network (RNN) or long short-term memory (LSTM) to exploit label relations over the last years. However, the RNN or LSTM could model such category dependence in a chain propagation manner. The performance of the RNN/LSTM might be questioned when a specific category is improperly inferred. To address this, we propose a novel HRS image multilabel classification network, transformer-driven semantic relation inference network. The network comprises two modules: semantic sensitive module (SSM) and semantic relation-building module (SRBM). The SSM locates the semantic attentional regions in the features extracted by a deep convolutional neural network and generates a discriminative content-aware category representation (CACR). The SRBM uses label relation inference from outputs of the SSM to predict final results. The characteristic of the proposed method is that it can extract semantic attentional regions relevant to the category and generate a discriminative CACR and natural and interpretable reasoning about label relations. Experiments were performed on the public UCM multilabel and MLRSNet datasets. Quantitative and qualitative analyses on state-of-the-art multilabel benchmarks proved that the proposed method could effectively locate semantic regions and build relationships between categories with better robustness.

Keywords