IEEE Access (Jan 2021)

An Approach to Detect Anomaly in Video Using Deep Generative Network

  • Savath Saypadith,
  • Takao Onoye

DOI
https://doi.org/10.1109/ACCESS.2021.3126335
Journal volume & issue
Vol. 9
pp. 150903 – 150910

Abstract

Read online

Anomaly detection in the video has recently gained attention due to its importance in the intelligent surveillance system. Even though the performance of the state-of-art methods has been competitive in the benchmark dataset, the trade-off between the computational resource and the accuracy of the anomaly detection should be considered. In this paper, we present a framework to detect anomalies in video. We proposed a “multi-scale U-Net” network architecture, the unsupervised learning for anomaly detection in video based on generative adversarial network (GAN) structure. Shortcut Inception Modules (SIMs) and residual skip connection are employed to the generator network to increase the ability of the training and testing of the neural network. An asymmetric convolution has been applied instead of traditional convolution layers to decrease the number of training parameters without performance penalty in terms of detection accuracy. In the training phase, the generator network was trained to generate the normal events and attempt to make the generated image and the ground truth to be similar. A multi-scale U-Net kept useful features of an image that were lost during training caused by the convolution operator. The generator network is trained by minimizing the reconstruction error on the normal data and then using the reconstruction error as an indicator of anomalies in the testing phase. Our proposed framework has been evaluated on three benchmark datasets, including UCSD pedestrian, CHUK Avenue, and ShanghaiTech. As a result, the proposed framework surpasses the state-of-the-art learning-based methods on all these datasets, which achieved 95.7%, 86.9%, and 73.0% in terms of AUC. Moreover, the numbers of training and testing parameters in our framework are reduced compared to the baseline network architecture, while the detection accuracy is still improved.

Keywords