IEEE Access (Jan 2024)
PH-GCN: Boosting Human Action Recognition Through Multi-Level Granularity With Pair-Wise Hyper GCN
Abstract
Recently, there has been a surge of interest in utilizing Graph Convolutional Networks (GCNs) for skeleton- based action recognition, where learning effective representations of the skeletal graph is of paramount importance for attaining success in this task. The Message-Passing Mechanism (MPM) is typically employed in GCNs to learn node embeddings by leveraging recursive neighborhood aggregations to iteratively calculate new feature vectors using arbitrary graphs. In this mechanism, pairwise associations between adjacent nodes in the graph are leveraged for feature aggregation and are thus important for extracting representative features. However, the relationships between joints in the skeletal graph are complex and not solely dependent on structural adjacency, posing a serious challenge for effectively capturing the complex semantic relationships among joints in the skeleton. In this paper, inspired by hyper-graph edges from graph theory, we propose our meticulously designed disassembled hyper-graph (DH-Graph) to reveal the underlying issue of distant associations. To do so, our DH-Graph is designed using several steps. First, we decompose the arbitrary graph into various overlapped hyper-edge groups considering their semantic relationships and importance to the action recognition task. Then, the groups are organized into a hierarchy to consider the multi-level granularity. Finally, pairwise connections are established between adjacent and distant joints within adjacent hierarchy sets to capture latent composite correlations in each specific semantic space of the human skeleton. Using our DH- Graph, we apply a GCN mechanism in the spatial domain to learn a general data representation, which results in the Pair-wise Hyper Hierarchical GCN (PH-GCN). Additionally, we introduce our HyperAttention module which is built based on Multi-scale Representative Spatial Average Pooling (MS-RSAP), and Edge convolution to highlight informative Hyper-hierarchical sets. Extensive experiments show that the proposed (PH-GCN) achieves remarkable performance on two challenging datasets, NTU RGB+D, and NTU-120 RGB+D.
Keywords