A lightweight hybrid vision transformer network for radar-based human activity recognition

Sha Huan; Zhaoyue Wang; Xiaoqiang Wang; Limei Wu; Xiaoxuan Yang; Hongming Huang; Gan E. Dai

doi:10.1038/s41598-023-45149-5

Scientific Reports (Oct 2023)

A lightweight hybrid vision transformer network for radar-based human activity recognition

Sha Huan,
Zhaoyue Wang,
Xiaoqiang Wang,
Limei Wu,
Xiaoxuan Yang,
Hongming Huang,
Gan E. Dai

Affiliations

Sha Huan: School of Electronics and Communication Engineering, Guangzhou University
Zhaoyue Wang: School of Electronics and Communication Engineering, Guangzhou University
Xiaoqiang Wang: College of Naval Architecture and Ocean Engineering, Naval University of Engineering
Limei Wu: School of Electronics and Communication Engineering, Guangzhou University
Xiaoxuan Yang: School of Electronics and Communication Engineering, Guangzhou University
Hongming Huang: School of Electronics and Communication Engineering, Guangzhou University
Gan E. Dai: School of Electronic Information Engineering, Foshan University

DOI: https://doi.org/10.1038/s41598-023-45149-5
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Radar-based human activity recognition (HAR) offers a non-contact technique with privacy protection and lighting robustness for many advanced applications. Complex deep neural networks demonstrate significant performance advantages when classifying the radar micro-Doppler signals that have unique correspondences with human behavior. However, in embedded applications, the demand for lightweight and low latency poses challenges to the radar-based HAR network construction. In this paper, an efficient network based on a lightweight hybrid Vision Transformer (LH-ViT) is proposed to address the HAR accuracy and network lightweight simultaneously. This network combines the efficient convolution operations with the strength of the self-attention mechanism in ViT. Feature Pyramid architecture is applied for the multi-scale feature extraction for the micro-Doppler map. Feature enhancement is executed by the stacked Radar-ViT subsequently, in which the fold and unfold operations are added to lower the computational load of the attention mechanism. The convolution operator in the LH-ViT is replaced by the RES-SE block, an efficient structure that combines the residual learning framework with the Squeeze-and-Excitation network. Experiments based on two human activity datasets indicate our method’s advantages in terms of expressiveness and computing efficiency over traditional methods.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal