IEEE Access (Jan 2024)
Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian Detection
Abstract
In pedestrian detection, small-scale pedestrians often face challenges such as limited pixel values and insufficient features, often leading to wrong or missed detection. Therefore, this paper proposed a multi-scale structure perception and global context-aware method for small-scale pedestrian detection. Firstly, to address the issue of decreasing features caused by the network deepens, we designed a feature fusion strategy to overcome the constraints of the feature pyramid hierarchy. This strategy combines deep and shallow feature maps and leverages the advantages of Transformer to capture long-distance dependent features, incorporating a global context information module to retain a substantial amount of small-scale pedestrian features. Secondly, considering the confusion between small-scale pedestrian features and background information, we employed a combination of self-attention modules and channel attention modules to jointly model the spatial and channel correlations of feature maps. This utilization of small-scale pedestrian context and channel information enhances small-scale pedestrian features while suppressing background information. Finally, to address the issue of gradient explosion during model training, we introduced a novel weighted loss function named ES-IoU, which significantly improved the convergence speed. Extensive experimental results on the CityPersons and CrowdHuman datasets demonstrate that the proposed method achieves a substantial improvement upon state-of-the-art methods.
Keywords