Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach

Aftab Ul Nabi; Jinglun Shi; Kamlesh; Awais Khan Jumani; Jameel Ahmed Bhutto

doi:10.1109/ACCESS.2024.3506598

IEEE Access (Jan 2024)

Hybrid Transformer-EfficientNet Model for Robust Human Activity Recognition: The BiTransAct Approach

Aftab Ul Nabi,
Jinglun Shi,
Kamlesh,
Awais Khan Jumani,
Jameel Ahmed Bhutto

Affiliations

Aftab Ul Nabi: ORCiD; School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, China
Jinglun Shi: ORCiD; School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, China
Kamlesh: School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, China
Awais Khan Jumani: School of Electronic and Information Engineering, South China University of Technology (SCUT), Guangzhou, Guangdong, China
Jameel Ahmed Bhutto: Department of Computer Science, Huanggang Normal University, Huanggang, China

DOI: https://doi.org/10.1109/ACCESS.2024.3506598
Journal volume & issue: Vol. 12
pp. 184517 – 184528

Abstract

Read online

Human Activity Recognition (HAR) has been employed in a number of applications including sports analytic, healthcare monitoring, surveillance, and human-computer interaction. Despite a decade of research on HAR, existing models still find it challenging under conditions like occlusion, computational efficiency, and capturing long-term temporal dependencies. To address these shortcomings, we present BiTransAct, a novel hybrid model that incorporates EfficientNet-B0 for spatial features extraction as well as Transformer Encoder to obtain the temporal relationships in video data. To evaluate the performance of our proposed model we have employed a video based dataset called SPHAR-Dataset-1.0. This dataset contains 7,759 videos with 14 diverse activities and 421,441 samples. From our experiments its established that BiTransAct consistently excels other deep learning based models like SWIN, EfficientNet, and RegNet in terms of both classification accuracy and precision. Its efficiency in handling large datasets without compromising on performance makes it stronger candidate for real-time HAR tasks. Furthermore, the features like self-attention mechanism and dynamic learning rate make BiTransAct even more robust and avoid overfitting. The results demonstrate that BiTransAct provides a scalable, efficient solution for HAR applications, with particular relevance for real-world scenarios such as video surveillance and healthcare monitoring.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords