A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model

Chhaya Gupta; Nasib Singh Gill; Preeti Gulia; Sangeeta Yadav; Giovanni Pau; Mohammad Alibakhshikenari; Xiangjie Kong

doi:10.1109/ojcs.2023.3334528

IEEE Open Journal of the Computer Society (Jan 2024)

A Real-Time 3-Dimensional Object Detection Based Human Action Recognition Model

Chhaya Gupta,
Nasib Singh Gill,
Preeti Gulia,
Sangeeta Yadav,
Giovanni Pau,
Mohammad Alibakhshikenari,
Xiangjie Kong

Affiliations

Chhaya Gupta: ORCiD; Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Nasib Singh Gill: ORCiD; Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Preeti Gulia: ORCiD; Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Sangeeta Yadav: ORCiD; Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Giovanni Pau: ORCiD; Faculty of Engineering and Architecture, Kore University, Enna, Italy
Mohammad Alibakhshikenari: ORCiD; Department of Signal Theory Communications, Universidad Carlos III Madrid, Getafe, Spain
Xiangjie Kong: ORCiD; College of Computer Science and Technology, Zhejiang University, Hangzhou, China

DOI: https://doi.org/10.1109/ojcs.2023.3334528
Journal volume & issue: Vol. 5
pp. 14 – 26

Abstract

Read online

Computer vision technologies have greatly improved in the last few years. Many problems have been solved using deep learning merged with more computational power. Action recognition is one of society's problems that must be addressed. Human Action Recognition (HAR) may be adopted for intelligent video surveillance systems, and the government may use the same for monitoring crimes and security purposes. This paper proposes a deep learning-based HAR model, i.e., a 3-dimensional Convolutional Network with multiplicative LSTM. The suggested model makes it easier to comprehend the tasks that an individual or team of individuals completes. The four-phase proposed model consists of a 3D Convolutional neural network (3DCNN) combined with an LSTM multiplicative recurrent network and Yolov6 for real-time object detection. The four stages of the proposed model are data fusion, feature extraction, object identification, and skeleton articulation approaches. The NTU-RGB-D, KITTI, NTU-RGB-D 120, UCF 101, and Fused datasets are some used to train the model. The suggested model surpasses other cutting-edge models by reaching an accuracy of 98.23%, 97.65%, 98.76%, 95.45%, and 97.65% on the abovementioned datasets. Other state-of-the-art (SOTA) methods compared in this study are traditional CNN, Yolov6, and CNN with BiLSTM. The results verify that actions are classified more accurately by the proposed model that combines all these techniques compared to existing ones.

Published in IEEE Open Journal of the Computer Society

ISSN: 2644-1268 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782664

About the journal

Abstract

Keywords