Remote Sensing (Aug 2024)
Enhancing Remote Sensing Object Detection with K-CBST YOLO: Integrating CBAM and Swin-Transformer
Abstract
Object detection via remote sensing encounters significant challenges due to factors such as small target sizes, uneven target distribution, and complex backgrounds. This paper introduces the K-CBST YOLO algorithm, which is designed to address these challenges. It features a novel architecture that integrates the Convolutional Block Attention Module (CBAM) and Swin-Transformer to enhance global semantic understanding of feature maps and maximize the utilization of contextual information. Such integration significantly improves the accuracy with which small targets are detected against complex backgrounds. Additionally, we propose an improved detection network that combines the improved K-Means algorithm with a smooth Non-Maximum Suppression (NMS) algorithm. This network employs an adaptive dynamic K-Means clustering algorithm to pinpoint target areas of concentration in remote sensing images that feature varied distributions and uses a smooth NMS algorithm to suppress the confidence of overlapping candidate boxes, thereby minimizing their interference with subsequent detection results. The enhanced algorithm substantially bolsters the model’s robustness in handling multi-scale target distributions, preserves more potentially valid information, and diminishes the likelihood of missed detections. This study involved experiments performed on the publicly available DIOR remote sensing image dataset and the DOTA aerial image dataset. Our experimental results demonstrate that, compared with other advanced detection algorithms, K-CBST YOLO outperforms all its counterparts in handling both datasets. It achieved a 68.3% mean Average Precision (mAP) on the DIOR dataset and a 78.4% mAP on the DOTA dataset.
Keywords