IET Image Processing (Nov 2024)
A fused score computation approach to reflect the overlap between the predicted box and the ground truth in pedestrian detection
Abstract
Abstract In pedestrian detection task, numerous predicted boxes and their corresponding scores are generated and these scores are used to filter these predicted boxes by non‐maximum suppression. This paper analysed the training process of the popular anchor‐based pedestrian detection models (e.g. YOLO and Faster RCNN), and found that the score of the predicted box reflects the overlap between the corresponding anchor and the ground truth, rather than the predicted box itself. Due to the many‐to‐one strategy adopted by anchor‐based methods, multiple predicted boxes could be generated around one predicted box. This study refers to the number of other predicted boxes around the target predicted box as its local density. When a predicted box has a higher local density, it should have a greater overlap with the ground truth. Therefore, this study proposed the fused score by introducing local density into the score. The experiments showed that replacing the score with the fused score can effectively improve the model's detection accuracy. The code and experiments will soon be open‐sourced at https://github.com/zefeichen/FusedScore.
Keywords