Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei

doi:10.11896/jsjkx.201100164

Jisuanji kexue (Jan 2022)

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei

Affiliations

GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei: 1 School of Software Engineering,University of Science and Technology of China,Suzhou,Jiangsu 215000,China<br/>2 Suzhou Research Institute,University of Science and Technology of China,Suzhou,Jiangsu 215000,China

DOI: https://doi.org/10.11896/jsjkx.201100164
Journal volume & issue: Vol. 49, no. 1
pp. 181 – 186

Abstract

Read online

Skeleton-based human action recognition is attracting more attention in computer vision.Recently,graph convolutional networks(GCNs),which is powerful to model non-Euclidean structure data,have obtained promising performance and enable a new paradigm for action recognition.Existing approaches mostly model the spatial dependency with emphasis mechanism since the huge pre-defined graph contains large quantities of noise.However,simply emphasizing subsets is not optimal for reflecting the dynamic underlying correlations between vertexes in a global manner.Furthermore,these methods are ineffective to capture the temporal dependencies as the CNNs or RNNs are not capable to model the intricate multi-range temporal relations.To address these issues,a multi-scale gated graph convolutional network (MSG-GCN) is proposed for skeleton-based action recognition.Specifically,a gated temporal convolution module (G-TCM) is presented to capture the consecutive short-term and interval long-term dependencies between vertexes in the temporal domain.Besides,a multi-dimensional attention module for spatial,temporal,and channel,which enhances the expressiveness of spatial graph,is integrated into GCNs with negligible overheads.Extensive experiments on two large-scale benchmark datasets,NTU-RGB+D and Kinetics,demonstrate that our approach outperforms the state-of-the-art baselines.

action recognition|skeleton modality|graph convolution|video classification|computer vision

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords