Современные инновации, системы и технологии (May 2023)

DETR-crowd is all you need

  • Liu Weijia ,
  • Zishen Zheng ,
  • Ke Fan ,
  • Kun He ,
  • Taiqiu Huang ,
  • Weijia Liu ,
  • Xianlun Ke ,
  • Yuming Xu

DOI
https://doi.org/10.47813/2782-2818-2023-3-2-0213-0224
Journal volume & issue
Vol. 3, no. 2

Abstract

Read online

"Crowded pedestrian detection" is a hot topic in the field of pedestrian detection. To address the issue of missed targets and small pedestrians in crowded scenes, an improved DETR object detection algorithm called DETR-crowd is proposed. The attention model DETR is used as the baseline model to complete object detection in the absence of partial features in crowded pedestrian scenes. The deformable attention encoder is introduced to effectively utilize multi-scale feature maps containing a large amount of small target information to improve the detection accuracy of small pedestrians. To enhance the efficiency of important feature extraction and refinement, the improved EfficientNet backbone network fused with a channel spatial attention module is used for feature extraction. To address the issue of low training efficiency of models that use attention detection modules, Smooth-L1 and GIOU are combined as the loss function during training, allowing the model to converge to higher precision. Experimental results on the Wider-Person crowded pedestrian detection dataset show that the proposed algorithm leads YOLO-X by 0.039 in AP50 accuracy and YOLO-V5 by 0.015 in AP50 accuracy. The proposed algorithm can be effectively applied to crowded pedestrian detection tasks.