IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Parallel Space and Channel Attention for Stronger Remote Sensing Object Detection
Abstract
The object detection of natural images tends to obtain advanced semantic information through multiple convolutions and pooling, ignoring the detailed information in the feature map. Pixel-level images may be the target we are looking for in remote sensing images. This article designs a new attention mechanism that can fully utilize the spatial and channel information of the image, strengthen the region of interest, and try to protect the image's original information. Find the most influential location information in the spatial dimension and the most influential feature map in the channel dimension. Strengthen important channels and positions in the feature map to make vital information stronger and weak information not lost. Combine the designed attention mechanism with existing modules to enhance YOLO-V7 detection capability. We have merged two publicly available remote sensing image datasets, increasing object types, and richer appearance features, which can better detect model performance. Experimental results on an improved dataset have shown that the enhanced model in this article can improve the detection ability of small- and medium-sized targets in complex backgrounds, with a 1% increase in mean average precision (mAP) value and a maximum improvement of 8.2% for single-class targets. Medium targets such as airports, dams, and soccer ball fields also increase by about 5%. We also conducted experiments on the DOTA1.0 dataset to demonstrate that mAP improved by 1.1%, with 13 target categories having higher APs. The improved model reduces computational complexity by 2.7%, which is very user-friendly for embedded devices.
Keywords