An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Yuanxin Ye; Xiaoyue Ren; Bai Zhu; Tengfeng Tang; Xin Tan; Yang Gui; Qin Yao

doi:10.3390/rs14030516

Remote Sensing (Jan 2022)

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Yuanxin Ye,
Xiaoyue Ren,
Bai Zhu,
Tengfeng Tang,
Xin Tan,
Yang Gui,
Qin Yao

Affiliations

Yuanxin Ye: Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
Xiaoyue Ren: Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
Bai Zhu: Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
Tengfeng Tang: Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
Xin Tan: Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China
Yang Gui: The 9th System Design Department of China Areospace Science Industry Corporation, Wuhan 430000, China
Qin Yao: Northwest Institute of Nuclear Technology, Xi’an 710025, China

DOI: https://doi.org/10.3390/rs14030516
Journal volume & issue: Vol. 14, no. 3
p. 516

Abstract

Read online

For remote sensing object detection, fusing the optimal feature information automatically and overcoming the sensitivity to adapt multi-scale objects remains a significant challenge for the existing convolutional neural networks. Given this, we develop a convolutional network model with an adaptive attention fusion mechanism (AAFM). The model is proposed based on the backbone network of EfficientDet. Firstly, according to the characteristics of object distribution in datasets, the stitcher is applied to make one image containing objects of various scales. Such a process can effectively balance the proportion of multi-scale objects and handle the scale-variable properties. In addition, inspired by channel attention, a spatial attention model is also introduced in the construction of the adaptive attention fusion mechanism. In this mechanism, the semantic information of the different feature maps is obtained via convolution and different pooling operations. Then, the parallel spatial and channel attention are fused in the optimal proportions by the fusion factors to get the further representative feature information. Finally, the Complete Intersection over Union (CIoU) loss is used to make the bounding box better cover the ground truth. The experimental results of the optical image dataset DIOR demonstrate that, compared with state-of-the-art detectors such as the Single Shot multibox Detector (SSD), You Only Look Once (YOLO) v4, and EfficientDet, the proposed module improves accuracy and has stronger robustness.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords