Mathematics (Apr 2023)

SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

  • Xiongjiang Xiao,
  • Ziliang Ren,
  • Huan Li,
  • Wenhong Wei,
  • Zhiyong Yang,
  • Huaide Yang

DOI
https://doi.org/10.3390/math11092115
Journal volume & issue
Vol. 11, no. 9
p. 2115

Abstract

Read online

RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.

Keywords