A Multi-Scale Video Longformer Network for Action Recognition

Congping Chen; Chunsheng Zhang; Xin Dong

doi:10.3390/app14031061

Applied Sciences (Jan 2024)

A Multi-Scale Video Longformer Network for Action Recognition

Congping Chen,
Chunsheng Zhang,
Xin Dong

Affiliations

Congping Chen: School of Mechanical Engineering and Rail Transit, Changzhou University, Changzhou 213164, China
Chunsheng Zhang: School of Mechanical Engineering and Rail Transit, Changzhou University, Changzhou 213164, China
Xin Dong: School of Innovation and Entrepreneurship, Changzhou University, Changzhou 213164, China

DOI: https://doi.org/10.3390/app14031061
Journal volume & issue: Vol. 14, no. 3
p. 1061

Abstract

Read online

Action recognition has found extensive applications in fields such as video classification and security monitoring. However, existing action recognition methods, such as those based on 3D convolutional neural networks, often struggle to capture comprehensive global information. Meanwhile, transformer-based approaches face challenges associated with excessively high computational complexity. We introduce a Multi-Scale Video Longformer network (MSVL), built upon the 3D Longformer architecture featuring a “local attention + global features” attention mechanism, enabling us to reduce computational complexity while preserving global modeling capabilities. Specifically, MSVL gradually reduces the video feature resolution and increases the feature dimensions across four stages. In the lower layers of the network (stage 1, stage 2), we leverage local window attention to alleviate local redundancy and computational demands. Concurrently, global tokens are employed to retain global features. In the higher layers of the network (stage 3, stage 4), this local window attention evolves into a dense computation mechanism, enhancing overall performance. Finally, extensive experiments are conducted on UCF101 (97.6%), HMDB51 (72.9%), and the assembly action dataset (100.0%), demonstrating the effectiveness and efficiency of the MSVL.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords