IEEE Access (Jan 2025)
EMSPAN: Efficient Multi-Scale Pyramid Attention Network for Object Counting Under Size Heterogeneity and Dense Scenarios
Abstract
Computer vision is becoming an increasingly vital field, offering significant opportunities for real-world applications. Object counting is one of its core aspects, with increasing utilization across scientific fields involving objects of varying sizes. Traditional counting methods, however, face challenges in dense scenarios, as they are often ineffective in handling objects of different sizes. To address these challenges, this paper proposes the Efficient Multi-Scale Pyramid Attention Network (EMSPAN) model, which is designed to tackle both dense and size-heterogeneous object counting tasks. Additionally, a novel ground truth density map generation method using size-adaptive Gaussian kernels is introduced, which dynamically adjusts kernel size based on object dimensions. This approach preserves spatial information more effectively and produces more accurate density maps, even in complex scenes. The EMSPAN model utilizes advanced attention mechanisms to capture the multi-scale spatial distribution and size variations of objects. Experiments on the shrimp larvae and crowd datasets, characterized by significant size diversity of individual objects, have demonstrated the superior performance of the proposed method in handling object counting tasks in dense and size-heterogeneous environments.
Keywords