An effective modular approach for crowd counting in an image using convolutional neural networks

Naveed Ilyas; Zaheer Ahmad; Boreom Lee; Kiseon Kim

doi:10.1038/s41598-022-09685-w

Scientific Reports (Apr 2022)

An effective modular approach for crowd counting in an image using convolutional neural networks

Naveed Ilyas,
Zaheer Ahmad,
Boreom Lee,
Kiseon Kim

Affiliations

Naveed Ilyas: Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST)
Zaheer Ahmad: Department of Computer Science, COMSATS University Islamabad
Boreom Lee: Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST)
Kiseon Kim: School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology

DOI: https://doi.org/10.1038/s41598-022-09685-w
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Abrupt and continuous nature of scale variation in a crowded scene is a challenging task to enhance crowd counting accuracy in an image. Existing crowd counting techniques generally used multi-column or single-column dilated convolution to tackle scale variation due to perspective distortion. However, due to multi-column nature, they obtain identical features, whereas, the standard dilated convolution (SDC) with expanded receptive field size has sparse pixel sampling rate. Due to sparse nature of SDC, it is highly challenging to obtain relevant contextual information. Further, features at multiple scale are not extracted despite some inception-based model is not used (which is cost effective). To mitigate theses drawbacks in SDC, we therefore, propose a hierarchical dense dilated deep pyramid feature extraction through convolution neural network (CNN) for single image crowd counting (HDPF). It comprises of three modules: general feature extraction module (GFEM), deep pyramid feature extraction module (PFEM) and fusion module (FM). The GFEM is responsible to obtain task independent general features. Whereas, PFEM plays a vital role to obtain the relevant contextual information due to dense pixel sampling rate caused by densely connected dense stacked dilated convolutional modules (DSDCs). Further, due to dense connections among DSDCs, the final feature map acquires multi-scale information with expanded receptive field as compared to SDC. Due to dense pyramid nature, it is very effective to propagate the extracted feature from lower dilated convolutional layers (DCLs) to middle and higher DCLs, which result in better estimation accuracy. The FM is used to fuse the incoming features extracted by other modules. The proposed technique is tested through simulations on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B) and Venice. Results justify its relative effectiveness in terms of selected performance.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal