Remote Sensing (Oct 2022)

A Novel Hybrid Attention-Driven Multistream Hierarchical Graph Embedding Network for Remote Sensing Object Detection

  • Shu Tian,
  • Lin Cao,
  • Lihong Kang,
  • Xiangwei Xing,
  • Jing Tian,
  • Kangning Du,
  • Ke Sun,
  • Chunzhuo Fan,
  • Yuzhe Fu,
  • Ye Zhang

DOI
https://doi.org/10.3390/rs14194951
Journal volume & issue
Vol. 14, no. 19
p. 4951

Abstract

Read online

Multiclass geospatial object detection in high-spatial-resolution remote-sensing images (HSRIs) has recently attracted considerable attention in many remote-sensing applications as a fundamental task. However, the complexity and uncertainty of spatial distribution among multiclass geospatial objects are still huge challenges for object detection in HSRIs. Most current remote-sensing object-detection approaches fall back on deep convolutional neural networks (CNNs). Nevertheless, most existing methods only focus on mining visual characteristics and lose sight of spatial or semantic relation discriminations, eventually degrading object-detection performance in HSRIs. To tackle these challenges, we propose a novel hybrid attention-driven multistream hierarchical graph embedding network (HA-MHGEN) to explore complementary spatial and semantic patterns for improving remote-sensing object-detection performance. Specifically, we first constructed hierarchical spatial graphs for multiscale spatial relation representation. Then, semantic graphs were also constructed by integrating them with the word embedding of object category labels on graph nodes. Afterwards, we developed a self-attention-aware multiscale graph convolutional network (GCN) to derive stronger for intra- and interobject hierarchical spatial relations and contextual semantic relations, respectively. These two relation networks were followed by a novel cross-attention-driven spatial- and semantic-feature fusion module that utilizes a multihead attention mechanism to learn associations between diverse spatial and semantic correlations, and guide them to endowing a more powerful discrimination ability. With the collaborative learning of the three relation networks, the proposed HA-MHGEN enables grasping explicit and implicit relations from spatial and semantic patterns, and boosts multiclass object-detection performance in HRSIs. Comprehensive and extensive experimental evaluation results on three benchmarks, namely, DOTA, DIOR, and NWPU VHR-10, demonstrate the effectiveness and superiority of our proposed method compared with that of other advanced remote-sensing object-detection methods.

Keywords