IEEE Access (Jan 2023)

A Comprehensive Survey of RGB-Based and Skeleton-Based Human Action Recognition

  • Cailing Wang,
  • Jingjing Yan

DOI
https://doi.org/10.1109/ACCESS.2023.3282311
Journal volume & issue
Vol. 11
pp. 53880 – 53898

Abstract

Read online

With the advancement of computer vision, human action recognition (HAR) has shown its broad research worth and application prospects in a wide range of fields such as intelligent security, automatic driving and human-machine interaction. Based on the type of data captured by cameras and sensors, e.g., RGB, depth, skeleton, and infrared data, HAR methods can be classified into RGB-based and skeleton-based. RGB data is easy and inexpensive to obtain, but RGB-based methods need to cope with a large amount of irrelevant background information and are easily affected by factors such as lighting and shooting angle. The skeleton-based methods eliminate the impact of background variables and require little computational work due to their skeleton-focused features, but they lack the context data necessary for HAR. This paper gives a thorough survey of these two approaches, covering deep learning methods, handcrafted feature extraction methods, common datasets, challenges, and future research directions. The skeleton-based action recognition methods Section specifically presents the most well-liked 2D and 3D pose estimation algorithms. This survey aims to give researchers new to the area or engaged in a long-term study a selection of datasets and algorithms, as well as an overview of the present issues and expected future directions in the field.

Keywords