Advanced Intelligent Systems (Jul 2024)

Video Anomaly Detection Utilizing Efficient Spatiotemporal Feature Fusion with 3D Convolutions and Long Short‐Term Memory Modules

  • Sareer Ul Amin,
  • Bumsoo Kim,
  • Yonghoon Jung,
  • Sanghyun Seo,
  • Sangoh Park

DOI
https://doi.org/10.1002/aisy.202300706
Journal volume & issue
Vol. 6, no. 7
pp. n/a – n/a

Abstract

Read online

Surveillance cameras produce vast amounts of video data, posing a challenge for analysts due to the infrequent occurrence of unusual events. To address this, intelligent surveillance systems leverage AI and computer vision to automatically detect anomalies. This study proposes an innovative method combining 3D convolutions and long short‐term memory (LSTM) modules to capture spatiotemporal features in video data. Notably, a structured coarse‐level feature fusion mechanism enhances generalization and mitigates the issue of vanishing gradients. Unlike traditional convolutional neural networks, the approach employs depth‐wise feature stacking, reducing computational complexity and enhancing the architecture. Additionally, it integrates microautoencoder blocks for downsampling, eliminates the computational load of ConvLSTM2D layers, and employs frequent feature concatenation blocks during upsampling to preserve temporal information. Integrating a Conv‐LSTM module at the down‐ and upsampling stages enhances the model's ability to capture short‐ and long‐term temporal features, resulting in a 42‐layer network while maintaining robust performance. Experimental results demonstrate significant reductions in false alarms and improved accuracy compared to contemporary methods, with enhancements of 2.7%, 0.6%, and 3.4% on the UCSDPed1, UCSDPed2, and Avenue datasets, respectively.

Keywords