International Journal of Technology (Oct 2023)
A Combination of Light Pre-trained Convolutional Neural Networks and Long Short-Term Memory for Real-Time Violence Detection in Videos
Abstract
Machine learning techniques have been used widely to analyze videos that have scenes of violence for censorship or surveillance purposes. Violence detection plays a crucial role in preventing underage and teenage exposure to violent acts and ensuring a safer viewing environment. The automatic identification of violent scenes is significant to classify videos into two classes, including violence and non-violence. The existing violence detection models suffer from several problems, including memory inefficiency and low-speed inference, and thus make them unsuitable to be implemented on embedding systems with limited resources. This article aims to propose a novel combination of light Convolutional Neural Networks (CNN), namely EfficientNet-B0 and Long Short-Term Memory (LSTM). The public dataset which consists of two different datasets, was utilized to train, evaluate, and compare the deep learning models used in this study. The experimental results show the superiority of EfficientNet B0-LSTM, which outperform other models in terms of accuracy (86.38%), F1 score (86.39%), and False Positive Rate (13.53%). Additionally, the proposed model has been deployed to a low-cost embedding device such as Raspberry Pi for real-time violence detection.
Keywords