IEEE Access (Jan 2025)
Spatiotemporal Semantic Modeling and Cross-Modal Collaboration-Based Gait Recognition Under Multiple Views and Various Walking Conditions
Abstract
With the increasing demand for non-intrusive and long-distance identity recognition in fields such as public safety, medical rehabilitation, and industrial energy, the importance of gait recognition technology has become increasingly prominent. However, existing gait recognition methods predominantly rely on unimodal data and lack sufficient integration of multimodal information, which hinders the ability to fully leverage the complementary advantages of different modalities. Furthermore, variations in viewing angles, walking states, and gait characteristics significantly affect the accuracy and robustness of recognition. In this paper, a spatiotemporal semantic modeling and cross-modal collaboration approach for multiple views and various walking conditions gait recognition is proposed by integrating temporal and spatial dimensions with silhouette, skeleton, Gait Energy Image (GEI), and Skeleton Energy Image (SEI). The method includes a Dual-Branch Attention Module (DBAM), a Multi-Branch Spatio-Temporal Modeling Module (MBSTM), and a Multi-modal Feature Fusion Module (MFFM). This approach effectively captures key features across various viewpoints and walking conditions while preserving local channel integrity. By modeling spatiotemporal information from multiple branches, it extracts high-level semantic features and fuses multi-modal data to enhance gait recognition performance. Experiments demonstrate that the proposed method achieves Rank-1 accuracies of 96.78%, 92.63%, and 76.28% under multiple views and various walking conditions (normal, carrying bags, wearing coats) on the CASIA-B dataset, and an average accuracy of 99.3% on the CASIA-C dataset, indicating strong generalization capability.
Keywords