IEEE Access (Jan 2024)

KianNet: A Violence Detection Model Using an Attention-Based CNN-LSTM Structure

  • Soheil Vosta,
  • Kin-Choong Yow

DOI
https://doi.org/10.1109/ACCESS.2023.3339379
Journal volume & issue
Vol. 12
pp. 2198 – 2209

Abstract

Read online

Violent behaviour is always an important issue that threatens any society. Therefore, many organizations have used surveillance cameras to monitor such events to preserve public safety and mitigate potential harm. It is difficult for human operators to monitor the copious camera feed manually, however, automated systems are employed to enhance the accuracy of violence detection and reduce errors. In this paper, we propose a novel model named KianNet that effectively detects violent incidents inside recorded events by combining ResNet50 and ConvLSTM architectures with a multi-head self-attention layer. The utilization of ResNet50 enables robust feature extraction, while ConvLSTM makes it easier to take advantage of the temporal dependencies in the video sequences. Furthermore, the multi-head self-attention layer enhances the model’s ability to focus on relevant spatiotemporal regions and their discriminatory capacity. Empirical investigations confirm that the proposed model outperforms its competitors by roughly 10 percent, achieving a 97.48% AUC on binary classification on the UCF-Crime dataset, and a 96.21% accuracy on the RWF dataset, surpassing Violence 4D.

Keywords