IEEE Access (Jan 2018)
DA-Net: Learning the Fine-Grained Density Distribution With Deformation Aggregation Network
Abstract
The major challenges for accurate crowd counting stem from the large variations in the scale, shape, and perspective. In fact, dealing with such difficulties depends on the geometric transformation capabilities of the network. Thus, we propose the deformation aggregation network (DA-Net) that can incrementally incorporate adaptive receptive fields to capture the fine-grained density distribution. Our model starts from a fundamental diamond network based on the successive steps of rising and falling on the channels, which utilizes small-fixed kernels accounting for preserving spatial information. For sake of accommodating to scale variations, we further apply deformable convolutions to deliver precise localization by augmenting the spatial sampling locations with additional offsets, capable of transformation-invariant speciality on diverse scenes. In addition, we assign adjustable weights to supervise network to learn the fusion proportion adaptively and aggregate multiple levels of abstractions to get the final density map. DA-Net is a fully convolutional network that is able to take input of arbitrary size. The proposed method delivers the state-of-the-art performance on four benchmarks (ShanghaiTech, WorldExpo'10, UCSD, and UCF_CC_50). To further verify the generality of our model, we conduct an extended experiment on vehicle data set TRANCOS and achieve significant results.
Keywords