Video Anomaly Detection Utilizing Efficient Spatiotemporal Feature Fusion with 3D Convolutions and Long Short‐Term Memory Modules

Sareer Ul Amin; Bumsoo Kim; Yonghoon Jung; Sanghyun Seo; Sangoh Park

doi:10.1002/aisy.202300706

Advanced Intelligent Systems (Jul 2024)

Video Anomaly Detection Utilizing Efficient Spatiotemporal Feature Fusion with 3D Convolutions and Long Short‐Term Memory Modules

Sareer Ul Amin,
Bumsoo Kim,
Yonghoon Jung,
Sanghyun Seo,
Sangoh Park

Affiliations

Sareer Ul Amin: Department of Computer Science and Engineering Chung‐Ang University Seoul 06974 South Korea
Bumsoo Kim: College of Art and Technology Chung‐Ang University Anseong 17546 South Korea
Yonghoon Jung: Department of Advanced Imaging Science Multimedia & Film Chung‐Ang University Seoul 06974 South Korea
Sanghyun Seo: College of Art and Technology Chung‐Ang University Anseong 17546 South Korea
Sangoh Park: Department of Computer Science and Engineering Chung‐Ang University Seoul 06974 South Korea

DOI: https://doi.org/10.1002/aisy.202300706
Journal volume & issue: Vol. 6, no. 7
pp. n/a – n/a

Abstract

Read online

Surveillance cameras produce vast amounts of video data, posing a challenge for analysts due to the infrequent occurrence of unusual events. To address this, intelligent surveillance systems leverage AI and computer vision to automatically detect anomalies. This study proposes an innovative method combining 3D convolutions and long short‐term memory (LSTM) modules to capture spatiotemporal features in video data. Notably, a structured coarse‐level feature fusion mechanism enhances generalization and mitigates the issue of vanishing gradients. Unlike traditional convolutional neural networks, the approach employs depth‐wise feature stacking, reducing computational complexity and enhancing the architecture. Additionally, it integrates microautoencoder blocks for downsampling, eliminates the computational load of ConvLSTM2D layers, and employs frequent feature concatenation blocks during upsampling to preserve temporal information. Integrating a Conv‐LSTM module at the down‐ and upsampling stages enhances the model's ability to capture short‐ and long‐term temporal features, resulting in a 42‐layer network while maintaining robust performance. Experimental results demonstrate significant reductions in false alarms and improved accuracy compared to contemporary methods, with enhancements of 2.7%, 0.6%, and 3.4% on the UCSDPed1, UCSDPed2, and Avenue datasets, respectively.

Published in Advanced Intelligent Systems

ISSN: 2640-4567 (Online)
Publisher: Wiley
Country of publisher: Germany
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://onlinelibrary.wiley.com/journal/26404567

About the journal

Abstract

Keywords