Privacy-Safe Action Recognition via Cross-Modality Distillation

Yuhyun Kim; Jinwook Jung; Hyeoncheol Noh; Byungtae Ahn; Junghye Kwon; Dong-Geol Choi

doi:10.1109/ACCESS.2024.3431227

IEEE Access (Jan 2024)

Privacy-Safe Action Recognition via Cross-Modality Distillation

Yuhyun Kim,
Jinwook Jung,
Hyeoncheol Noh,
Byungtae Ahn,
Junghye Kwon,
Dong-Geol Choi

Affiliations

Yuhyun Kim: ORCiD; Department of Information and Communication Engineering, Hanbat National University, Daejeon, Republic of Korea
Jinwook Jung: ORCiD; Department of Information and Communication Engineering, Hanbat National University, Daejeon, Republic of Korea
Hyeoncheol Noh: ORCiD; Department of Information and Communication Engineering, Hanbat National University, Daejeon, Republic of Korea
Byungtae Ahn: ORCiD; Korea Institute of Machinery and Materials, Daejeon, Republic of Korea
Junghye Kwon: Department of Internal Medicine, Division of Hematology and Oncology, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
Dong-Geol Choi: ORCiD; Department of Information and Communication Engineering, Hanbat National University, Daejeon, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3431227
Journal volume & issue: Vol. 12
pp. 125955 – 125965

Abstract

Read online

Human action recognition systems enhance public safety by detecting abnormal behavior autonomously. RGB sensors commonly used in such systems capture personal information of subjects and, as a result, run the risk of potential privacy leakage. On the other hand, privacy-safe alternatives, such as depth or thermal sensors, exhibit poorer performance because they lack the semantic context provided by RGB sensors. Moreover, the data availability of privacy-safe alternatives is significantly lower than RGB sensors. To address these problems, we explore effective cross-modality distillation methods in this paper, aiming to distill the knowledge of context-rich large-scale pre-trained RGB-based models into privacy-safe depth-based models. Based on extensive experiments on multiple architectures and benchmark datasets, we propose an effective method for training privacy-safe depth-based action recognition models via cross-modality distillation: cross-modality mixing distillation. This approach improves both the performance and efficiency by enabling interaction between depth and RGB modalities through a linear combination of their features. By utilizing the proposed cross-modal mixing distillation approach, we achieve state-of-the-art accuracy in two depth-based action recognition benchmarks. The code and the pre-trained models will be available upon publication.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords