IEEE Access (Jan 2024)

Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier

  • Mouazma Batool,
  • Moneerah Alotaibi,
  • Sultan Refa Alotaibi,
  • Dina Abdulaziz Alhammadi,
  • Muhammad Asif Jamal,
  • Ahmad Jalal,
  • Bumshik Lee

DOI
https://doi.org/10.1109/ACCESS.2024.3481631
Journal volume & issue
Vol. 12
pp. 158388 – 158406

Abstract

Read online

Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.

Keywords