Geocarto International (Jan 2024)

SDMSEAF-YOLOv8: a framework to significantly improve the detection performance of unmanned aerial vehicle images

  • Linxuan Li,
  • Xiaoyu Liu,
  • Xuan Chen,
  • Fengjuan Yin,
  • Bin Chen,
  • Yufeng Wang,
  • Fanbin Meng

DOI
https://doi.org/10.1080/10106049.2024.2339294
Journal volume & issue
Vol. 39, no. 1

Abstract

Read online

AbstractThe detailed, high-resolution images captured by drones pose challenges to target detection algorithms with complex scenes and small-sized targets. Moreover, targets in unmanned aerial vehicle images are usually affected by factors such as viewing perspective, occlusion, and light, which increase the difficulty of target detection. In response to the above issues, we propose an improved SDMSEAF-YOLOv8 for target detection based on YOLOv8, combined with a Bi-directional Feature Pyramid Network, to improve the sensing ability of the model for multiscale targets. A Space-to-depth layer replaces the traditional strided convolution layer to enhance the extraction of fine-grained information and small-sized target features. A Multi-Separated and Enhancement Attention module enhances the feature learning ability of the occluded target region, thus reducing missed and false detections. Four detection heads are employed for tiny target detection, each responsible for different size ranges, so as to improve the accuracy and robustness of small target detection. The conventional non-maximum suppression algorithm is improved, so as to reduce the problem of missed detections under a densely occluded scene by setting the attenuation function to adjust the confidence of the treated box based on the overlap between it and the highest-scoring box. Experiments demonstrate that the accuracy of SDMSEAF-YOLOv8 exceeds that of state-of-the-art models on the VisDrone2019-DET-val dataset, with a mAP of 42.9% at 640-pixel resolution, 14.8% over the baseline YOLOv8-x algorithm model, and 6.0% over the known state-of-the-art Fine-Grained Target Focusing Network model and with twice as fast detection.

Keywords