Metric-Based Attention Feature Learning for Video Action Recognition

Dae Ha Kim; Fazliddin Anvarov; Jun Min Lee; Byung Cheol Song

doi:10.1109/ACCESS.2021.3064934

IEEE Access (Jan 2021)

Metric-Based Attention Feature Learning for Video Action Recognition

Dae Ha Kim,
Fazliddin Anvarov,
Jun Min Lee,
Byung Cheol Song

Affiliations

Dae Ha Kim: ORCiD; Department of Electrical and Computer Engineering, Inha University, Incheon, South Korea
Fazliddin Anvarov: ORCiD; Department of Electrical and Computer Engineering, Inha University, Incheon, South Korea
Jun Min Lee: ORCiD; Department of Electrical and Computer Engineering, Inha University, Incheon, South Korea
Byung Cheol Song: ORCiD; Department of Electrical and Computer Engineering, Inha University, Incheon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3064934
Journal volume & issue: Vol. 9
pp. 39218 – 39228

Abstract

Read online

Conventional approaches for video action recognition were designed to learn feature maps using 3D convolutional neural networks (CNNs). For better action recognition, they trained the large-scale video datasets with the representation power of 3D CNN. However, action recognition is still a challenging task. Since the previous methods rarely distinguish human body from environment, they often overfit background scenes. Note that separating human body from background allows to learn distinct representations of human action. This paper proposes a novel attention module aiming at only action part(s), while neglecting non-action part(s) such as background. First, the attention module employs triplet loss to differentiate active features from non-active or less active features. Second, two attention modules based on spatial and channel domains are proposed to enhance the feature representation ability for action recognition. The spatial attention module is to learn spatial correlation of features, and the channel attention module is to learn channel correlation. Experimental results show that the proposed method achieves state-of-the-art performance of 41.41% and 55.21% on Diving48 and Something-V1 datasets, respectively. In addition, the proposed method provides competitive performance even on UCF101 and HMDB-51 datasets, i.e., 95.83% on UCF-101 and 74.33% on HMDB-51.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords