SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

Xiongjiang Xiao; Ziliang Ren; Huan Li; Wenhong Wei; Zhiyong Yang; Huaide Yang

doi:10.3390/math11092115

Mathematics (Apr 2023)

SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

Xiongjiang Xiao,
Ziliang Ren,
Huan Li,
Wenhong Wei,
Zhiyong Yang,
Huaide Yang

Affiliations

Xiongjiang Xiao: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Ziliang Ren: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Huan Li: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Wenhong Wei: School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China
Zhiyong Yang: School of Artificial Intelligence, Yantai Institute of Technology, Yantai 264003, China
Huaide Yang: School of Electronic Information, Dongguan Polytechnic, Dongguan 523109, China

DOI: https://doi.org/10.3390/math11092115
Journal volume & issue: Vol. 11, no. 9
p. 2115

Abstract

Read online

RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords