IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)

A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion

  • Youxiang Huang,
  • Donglai Jiao,
  • Xingru Huang,
  • Tiantian Tang,
  • Guan Gui

DOI
https://doi.org/10.1109/JSTARS.2024.3483253
Journal volume & issue
Vol. 18
pp. 241 – 254

Abstract

Read online

Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.

Keywords