Jisuanji kexue yu tansuo (May 2024)

Dense Pedestrian Detection Based on Shifted Window Attention Multi-scale Equalization

  • YU Fan, ZHANG Jing

DOI
https://doi.org/10.3778/j.issn.1673-9418.2303110
Journal volume & issue
Vol. 18, no. 5
pp. 1286 – 1300

Abstract

Read online

Due to the large differences in the shape and scale of pedestrian targets in real-world scenarios, compared with traditional methods, which often have lower average accuracy in pedestrian detection, transformer-based networks with attention mechanisms have shown strong performance in the field of pedestrian detection. However, there are still some difficulties in multi-scale detection in dense scenes. In dense scenes, there are usually a large number of occluded or small-scale pedestrian targets, leading to a large number of false and missed detections, as well as a significant amount of computing resources. Additionally, accurate detection of all targets becomes extremely difficult when pedestrian targets overlap significantly. To address these issues, a dense scene multi-scale pedestrian detection algorithm based on shifted window attention is proposed. Using modified Swin blocks in backbone enables the network to extract more detailed features while reducing the heavy computational burden brought by attention mechanisms. To effectively solve the feature fusion problem, DyHead blocks are used in the neck to unify multiple attention operations, thereby improving feature fusion efficiency. To address the feature balance issue, a feature scale-equalizing module based on full connection is designed, which constructs different residual structures between various levels of the feature pyramid to balance features and assist the model in generating higher-quality feature maps. Experimental results on the WiderPerson dataset show that this algorithm improves AP value by 1.1 percentage points, with 1.0 and 0.7 percentage points improvement in the most important small and medium targets, respectively.

Keywords