Dual attention module and multi‐label based fully convolutional network for crowd counting

Suyu Wang; Bin Yang; Bo Liu; Guanghui Zheng

doi:10.1049/iet-cvi.2019.0674

IET Computer Vision (Oct 2020)

Dual attention module and multi‐label based fully convolutional network for crowd counting

Suyu Wang,
Bin Yang,
Bo Liu,
Guanghui Zheng

Affiliations

Suyu Wang: Beijing Engineering Research Center for IoT Software and SystemsBeijing100124BeijingPeople's Republic of China
Bin Yang: Beijing Engineering Research Center for IoT Software and SystemsBeijing100124BeijingPeople's Republic of China
Bo Liu: Faculty of Information TechnologyBeijing University of TechnologyBeijing100124BeijingPeople's Republic of China
Guanghui Zheng: Beijing Engineering Research Center for IoT Software and SystemsBeijing100124BeijingPeople's Republic of China

DOI: https://doi.org/10.1049/iet-cvi.2019.0674
Journal volume & issue: Vol. 14, no. 7
pp. 443 – 451

Abstract

Read online

High‐density crowd counting in natural scenes is an extremely difficult and challenging research subject in computer vision. Although the algorithm based on the convolutional neural network has achieved significantly better results than the traditional algorithm, most of them tend to focus on the local features of images, and difficult to obtain the rich global contextual dependencies. To solve this problem, a dual attention module and a multi‐label based fully convolutional network are proposed in this study. Moreover, the authors improve the algorithm by the following multiple perspectives. Firstly, introducing the dual attention module, the global‐context and long‐range dependency are adaptively integrated into both spatial and channel dimensions, which improve the network expression ability. Then, the prediction error is effectively reduced by designing a multi‐label mechanism, so the crowd‐counting task is transformed into foreground and background segmentation tasks to assist in the regression task of the density map. Furthermore, on the basis of the traditional Euclidean distance loss and cross‐entropy loss, the structural similarity index is introduced to further improve the training effect of the model. The test results of the UCF_CC_50, ShanghaiTech, and UCF‐QNRF datasets indicate that the proposed method is superior to the current mainstream algorithm.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords