3sG: Three‐stage guidance for indoor human action recognition

Hai Nan; Qilang Ye; Zitong Yu; Kang An

doi:10.1049/ipr2.13078

IET Image Processing (Jun 2024)

3sG: Three‐stage guidance for indoor human action recognition

Hai Nan,
Qilang Ye,
Zitong Yu,
Kang An

Affiliations

Hai Nan: School of Computer Science and Engineering Chongqing University of Technology Chongqing China
Qilang Ye: School of Computer Science and Engineering Chongqing University of Technology Chongqing China
Zitong Yu: School of Computing and Information Technology Great Bay University Dongguan China
Kang An: The College of Information Mechanical and Electrical Engineering Shanghai Normal University Shanghai China

DOI: https://doi.org/10.1049/ipr2.13078
Journal volume & issue: Vol. 18, no. 8
pp. 2000 – 2010

Abstract

Read online

Abstract Inference using skeleton to steer RGB videos is applicable to fine‐grained activities in indoor human action recognition (IHAR). However, existing methods that explore only spatial alignment are prone to bias, resulting in limited performance. The authors propose a Three‐stage Guidance (3sG) framework, leveraging skeleton knowledge to promote RGB in three stages. First, a soft shading image is proposed for alleviating background noise in videos, allowing the network to directly focus more on the motion region. Second, the authors propose to extract RGB frames of interest to reduce the computational effort. Furthermore, to explore more fully the complementary information between skeletons and RGB, the skeleton is coupled to the frame representation in a different spatial–temporal sharing pattern. Third, the global skeleton and skeleton‐guided RGB features are fed into the shared classifiers, which approximate the logit distributions of the two to enhance the performance in RGB unimodal. Finally, a fusion strategy that utilizes two learnable parameters to adaptively integrate the skeleton with the RGB is proposed. 3sG outperforms the state‐of‐the‐art results on the Toyota Smarthome dataset while it is more efficient than similar methods on the NTU RGB+D dataset.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords