IEEE Access (Jan 2024)
Analysis of Fine-Grained Counting Methods for Masked Face Counting: A Comparative Study
Abstract
Masked face counting is the counting of faces at various crowd densities and discriminating between masked and unmasked faces, which is generally considered to be an object (i.e., face) detection task. Counting accuracy is limited, especially at higher densities, when the faces are relatively small, unclear, and viewed at various angles. Furthermore, it is costly to create the ground-truth bounding boxes needed to train object detection methods. We formulate masked face detection as a fine-grained crowd-counting task, which is appropriate for tackling this challenging task when used with density map regression. However, adopting fine-grained crowd-counting methods for masked face counting is not trivial. It is necessary to identify strategies appropriate for both counting and multi-class classification. We contrasted the strategies of various approaches and examined their benefits and drawbacks. These strategies include (1) simple regression with mixed regression and detection for counting, (2) using class-aware density maps with semantic segmentation maps and class probabilities for classification, and (3) counting with or without depth information enhancement. Analysis of seven crowd-counting methods on three datasets with a total of about 900k annotations demonstrated that the level of congestion affects how well simple regression and mixed regression and detection work for counting. Meanwhile, the most effective approach for classification is using semantic segmentation maps. Evaluation of the usefulness of using depth data demonstrated the need for a depth map to achieve accurate counting. These findings should be useful for future studies.
Keywords