IEEE Access (Jan 2024)
Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach
Abstract
Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring.
Keywords