IEEE Access (Jan 2024)

Spatially Aware Fusion in 3D Convolutional Autoencoders for Video Anomaly Detection

  • Asim Niaz,
  • Sareer Ul Amin,
  • Shafiullah Soomro,
  • Hamza Zia,
  • Kwang Nam Choi

DOI
https://doi.org/10.1109/ACCESS.2024.3435144
Journal volume & issue
Vol. 12
pp. 104770 – 104784

Abstract

Read online

Surveillance videos are crucial for crime prevention and public safety, yet the challenge of defining abnormal events hinders their effectiveness, limiting the applicability of supervised methods. This paper introduces an unsupervised end-to-end architecture for video anomaly detection that applies spatial and temporal features to identify anomalies in surveillance footage. The model employs a three-dimensional (3D) convolutional autoencoder, with an encoder-decoder structure that learns spatiotemporal representations and reconstructs the input through the latent space. Skip connections linking the encoder and decoder blocks facilitate the transfer of information across various scales of feature representations, enhancing the reconstruction process and improving the overall performance. The architecture incorporates spatial attention modules that highlight informative regions in the input, enabling improved anomaly detection. Spatial and contextual dependencies are further acquired using 3D convolutional filters. The performance of the proposed model is assessed on four benchmark datasets: UCSD Pedestrian 1, UCSD Pedestrian 2, CUHK Avenue, and ShanghaiTech. Notably, the proposed model achieves frame-based Area Under the Curve (AUC) scores of 94.6% on UCSD Ped 1, 96.7% on UCSD Ped 2, 84.7% on CUHK Avenue, and 74.8% on ShanghaiTech. These results demonstrate the state-of-the-art performance of the proposed approach, highlighting its efficacy in real-world anomaly detection scenarios.

Keywords