Heliyon (Aug 2024)
Volleyball training video classification description using the BiLSTM fusion attention mechanism
Abstract
This study aims to explore methods for classifying and describing volleyball training videos using deep learning techniques. By developing an innovative model that integrates Bi-directional Long Short-Term Memory (BiLSTM) and attention mechanisms, referred to BiLSTM-Multimodal Attention Fusion Temporal Classification (BiLSTM-MAFTC), the study enhances the accuracy and efficiency of volleyball video content analysis. Initially, the model encodes features from various modalities into feature vectors, capturing different types of information such as positional and modal data. The BiLSTM network is then used to model multi-modal temporal information, while spatial and channel attention mechanisms are incorporated to form a dual-attention module. This module establishes correlations between different modality features, extracting valuable information from each modality and uncovering complementary information across modalities. Extensive experiments validate the method's effectiveness and state-of-the-art performance. Compared to conventional recurrent neural network algorithms, the model achieves recognition accuracies exceeding 95 % under Top-1 and Top-5 metrics for action recognition, with a recognition speed of 0.04 s per video. The study demonstrates that the model can effectively process and analyze multimodal temporal information, including athlete movements, positional relationships on the court, and ball trajectories. Consequently, precise classification and description of volleyball training videos are achieved. This advancement significantly enhances the efficiency of coaches and athletes in volleyball training and provides valuable insights for broader sports video analysis research.