Drones (Feb 2025)

ESO-DETR: An Improved Real-Time Detection Transformer Model for Enhanced Small Object Detection in UAV Imagery

  • Yingfan Liu,
  • Miao He,
  • Bin Hui

DOI
https://doi.org/10.3390/drones9020143
Journal volume & issue
Vol. 9, no. 2
p. 143

Abstract

Read online

Object detection is a fundamental capability that enables drones to perform various tasks. However, achieving a suitable equilibrium between performance, efficiency, and lightweight design continues to be a significant challenge for current algorithms. To address this issue, we propose an enhanced small object detection transformer model called ESO-DETR. First, we present a gated single-head attention backbone block, known as the GSHA block, which enhances the extraction of local details. Besides, ESO-DETR utilizes the multiscale multihead self-attention mechanism (MMSA) to efficiently manage complex features within its backbone network. We also introduce a novel and efficient feature fusion pyramid network for enhanced small object detection, termed ESO-FPN. This network integrates large convolutional kernels with dual-domain attention mechanisms. Lastly, we introduce the EMASlideVariFocal loss (ESVF Loss), which dynamically adjusts the weights to improve the model’s focus on more challenging samples. In comparison with the baseline model, ESO-DETR demonstrates enhancements of 3.9% and 4.0% in the mAP50 metric on the VisDrone and HIT-UAV datasets, respectively, while also reducing parameters by 25%. These results highlight the capability of ESO-DETR to improve detection accuracy while maintaining a lightweight and efficient structure.

Keywords