Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition

Shengcai Duan; Le Wu; Aiping Liu; Xun Chen

doi:10.1109/TNSRE.2023.3335101

IEEE Transactions on Neural Systems and Rehabilitation Engineering (Jan 2023)

Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition

Shengcai Duan,
Le Wu,
Aiping Liu,
Xun Chen

Affiliations

Shengcai Duan: ORCiD; School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Le Wu: ORCiD; School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Aiping Liu: ORCiD; School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Xun Chen: ORCiD; Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, and the Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China

DOI: https://doi.org/10.1109/TNSRE.2023.3335101
Journal volume & issue: Vol. 31
pp. 4661 – 4671

Abstract

Read online

Hand gesture recognition (HGR) based on surface electromyogram (sEMG) and Accelerometer (ACC) signals is increasingly attractive where fusion strategies are crucial for performance and remain challenging. Currently, neural network-based fusion methods have gained superior performance. Nevertheless, these methods typically fuse sEMG and ACC either in the early or late stages, overlooking the integration of entire cross-modal hierarchical information within each individual hidden layer, thus inducing inefficient inter-modal fusion. To this end, we propose a novel Alignment-Enhanced Interactive Fusion (AiFusion) model, which achieves effective fusion via a progressive hierarchical fusion strategy. Notably, AiFusion can flexibly perform both complete and incomplete multimodal HGR. Specifically, AiFusion contains two unimodal branches and a cascaded transformer-based multimodal fusion branch. The fusion branch is first designed to adequately characterize modality-interactive knowledge by adaptively capturing inter-modal similarity and fusing hierarchical features from all branches layer by layer. Then, the modality-interactive knowledge is aligned with that of unimodality using cross-modal supervised contrastive learning and online distillation from embedding and probability spaces respectively. These alignments further promote fusion quality and refine modality-specific representations. Finally, the recognition outcomes are set to be determined by available modalities, thus contributing to handling the incomplete multimodal HGR problem, which is frequently encountered in real-world scenarios. Experimental results on five public datasets demonstrate that AiFusion outperforms most state-of-the-art benchmarks in complete multimodal HGR. Impressively, it also surpasses the unimodal baselines in the challenging incomplete multimodal HGR. The proposed AiFusion provides a promising solution to realize effective and robust multimodal HGR-based interfaces.

Published in IEEE Transactions on Neural Systems and Rehabilitation Engineering

ISSN: 1534-4320 (Print); 1558-0210 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Medical technology; Medicine: Therapeutics. Pharmacology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7333

About the journal

Abstract

Keywords