Focal Channel Knowledge Distillation for Multi-Modality Action Recognition

Lipeng Gan; Runze Cao; Ning Li; Man Yang; Xiaochao Li

doi:10.1109/ACCESS.2023.3298647

IEEE Access (Jan 2023)

Focal Channel Knowledge Distillation for Multi-Modality Action Recognition

Lipeng Gan,
Runze Cao,
Ning Li,
Man Yang,
Xiaochao Li

Affiliations

Lipeng Gan: ORCiD; Department of Microelectronics and lntegrated Circuit, Xiamen University, Xiamen, China
Runze Cao: Department of Microelectronics and lntegrated Circuit, Xiamen University, Xiamen, China
Ning Li: Department of Microelectronics and lntegrated Circuit, Xiamen University, Xiamen, China
Man Yang: Department of Microelectronics and lntegrated Circuit, Xiamen University, Xiamen, China
Xiaochao Li: ORCiD; Department of Microelectronics and lntegrated Circuit, Xiamen University, Xiamen, China

DOI: https://doi.org/10.1109/ACCESS.2023.3298647
Journal volume & issue: Vol. 11
pp. 78285 – 78298

Abstract

Read online

The multi-modality action recognition aims to learn the complementary information from multiple modalities to improve the action recognition performance. However, there exists a significant modality channel difference, the equal transferring channel semantic features from multi-modalities to RGB will result in competition and redundancy during knowledge distillation. To address this issue, we propose a focal channel knowledge distillation strategy to transfer the key semantic correlations and distributions of multi-modality teachers into the RGB student network. The focal channel correlations provide intrinsic relationships and diversity properties of key semantics, and focal channel distributions provide salient channel activation of features. By ignoring the less-discriminative and irrelevant channels, the student can more efficiently utilize the channel capability to learn the complementary semantic features from the other modalities. Our focal channel knowledge distillation achieves 91.2%, 95.6%, 98.3% and 81.0% accuracy with 4.5%, 4.2%, 3.7% and 7.1% improvement on NTU 60 (CS), UTD-MHAD, N-UCLA and HMDB51 datasets comparing to unimodal RGB models. This focal channel knowledge distillation framework can also be integrated with the unimodal models to achieve the state-of-the-art performance. The extensive experiments show that the proposed method achieves 92.5%, 96.0%, 98.9%, and 82.3% accuracy on NTU 60 (CS), UTD-MHAD, N-UCLA, and HMDB51 datasets respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords