Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition

Miaoqi Pang; Hongtao Wang; Jiayang Huang; Chi-Man Vong; Zhiqiang Zeng; Chuangquan Chen

doi:10.1109/tnsre.2024.3389037

IEEE Transactions on Neural Systems and Rehabilitation Engineering (Jan 2024)

Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition

Miaoqi Pang,
Hongtao Wang,
Jiayang Huang,
Chi-Man Vong,
Zhiqiang Zeng,
Chuangquan Chen

Affiliations

Miaoqi Pang: School of Electronics and Information Engineering, Wuyi University, Jiangmen, China
Hongtao Wang: ORCiD; School of Electronics and Information Engineering, Wuyi University, Jiangmen, China
Jiayang Huang: School of Electronics and Information Engineering, Wuyi University, Jiangmen, China
Chi-Man Vong: ORCiD; Department of Computer and Information Science, University of Macau, Macau, China
Zhiqiang Zeng: ORCiD; School of Electronics and Information Engineering, Wuyi University, Jiangmen, China
Chuangquan Chen: ORCiD; School of Electronics and Information Engineering, Wuyi University, Jiangmen, China

DOI: https://doi.org/10.1109/tnsre.2024.3389037
Journal volume & issue: Vol. 32
pp. 1637 – 1646

Abstract

Read online

Affective brain-computer interfaces (aBCIs) have garnered widespread applications, with remarkable advancements in utilizing electroencephalogram (EEG) technology for emotion recognition. However, the time-consuming process of annotating EEG data, inherent individual differences, non-stationary characteristics of EEG data, and noise artifacts in EEG data collection pose formidable challenges in developing subject-specific cross-session emotion recognition models. To simultaneously address these challenges, we propose a unified pre-training framework based on multi-scale masked autoencoders (MSMAE), which utilizes large-scale unlabeled EEG signals from multiple subjects and sessions to extract noise-robust, subject-invariant, and temporal-invariant features. We subsequently fine-tune the obtained generalized features with only a small amount of labeled data from a specific subject for personalization and enable cross-session emotion recognition. Our framework emphasizes: 1) multi-scale representation to capture diverse aspects of EEG signals, obtaining comprehensive information; 2) an improved masking mechanism for robust channel-level representation learning, addressing missing channel issues while preserving inter-channel relationships; and 3) invariance learning for regional correlations in spatial-level representation, minimizing inter-subject and inter-session variances. Under these elaborate designs, the proposed MSMAE exhibits a remarkable ability to decode emotional states from a different session of EEG data during the testing phase. Extensive experiments conducted on the two publicly available datasets, i.e., SEED and SEED-IV, demonstrate that the proposed MSMAE consistently achieves stable results and outperforms competitive baseline methods in cross-session emotion recognition.

Published in IEEE Transactions on Neural Systems and Rehabilitation Engineering

ISSN: 1534-4320 (Print); 1558-0210 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Medical technology; Medicine: Therapeutics. Pharmacology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7333

About the journal

Abstract

Keywords