IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2022)

Dual Network Structure With Interweaved Global-Local Feature Hierarchy for Transformer-Based Object Detection in Remote Sensing Image

  • Jingqian Xue,
  • Da He,
  • Mengwei Liu,
  • Qian Shi

DOI
https://doi.org/10.1109/JSTARS.2022.3198577
Journal volume & issue
Vol. 15
pp. 6856 – 6866

Abstract

Read online

Frequent and accurate object detection based on remote sensing images is an encouraging approach for monitoring dynamic of the interest object on earth surface. Transformer-based object detection was recently developed to cope with the tradeoff dilemma between large computation load and accuracy sacrifice confronted by region-proposal-based and regression-based object detection, and its self-attention mechanism can provide a global understanding that has potential ability for reasoning the location relationship within sparsely heterogeneously distributed geospatial objects. However, transformer-based object detection is essentially weak at modeling local feature hierarchy to compensate for the large scale variation of geospatial object, and it is extremely difficult to train due to the lack of inductive bias, resulting in a slow convergence. To overcome the problem, this article proposed a Dual network structure with InterweAved Global-local feature hierarchy based on the TRansformer architecture (DIAG-TR), to alleviate the incompatibility of global and local feature form, and hierarchically embed the local features into global representations. Besides, a learnable anchor box is incorporated into the positional query in the decoder part to provide a spatial prior, which can accelerate convergence. The proposed DIAG-TR is validated on the widely used optical remote sensing image DIOR dataset, and the results demonstrate that the global-local feature hierarchy contributes 3.4% mean average precision compared to the original transformer-based method, and the convergence time is shortened by 2.5-fold. State-of-the-art methods are also participated as benchmark for comparison, and DIAG-TR outperforms baseline method by 8.9%, which proves that DIAG-TR has great potential in earth observation community.

Keywords