Attention and Sequence Modeling for Match-Mismatch Classification of Speech Stimulus and EEG Response

Marvin Borsdorf; Siqi Cai; Saurav Pahuja; Dashanka De Silva; Haizhou Li; Tanja Schultz

doi:10.1109/OJSP.2023.3340063

IEEE Open Journal of Signal Processing (Jan 2024)

Attention and Sequence Modeling for Match-Mismatch Classification of Speech Stimulus and EEG Response

Marvin Borsdorf,
Siqi Cai,
Saurav Pahuja,
Dashanka De Silva,
Haizhou Li,
Tanja Schultz

Affiliations

Marvin Borsdorf: ORCiD; Machine Listening Lab, University of Bremen, Bremen, Germany
Siqi Cai: ORCiD; Department of Electrical and Computer Engineering, National University of Singapore, Singapore
Saurav Pahuja: ORCiD; Machine Listening Lab, University of Bremen, Bremen, Germany
Dashanka De Silva: ORCiD; Machine Listening Lab, University of Bremen, Bremen, Germany
Haizhou Li: ORCiD; Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen, China
Tanja Schultz: ORCiD; Cognitive Systems Lab, University of Bremen, Bremen, Germany

DOI: https://doi.org/10.1109/OJSP.2023.3340063
Journal volume & issue: Vol. 5
pp. 799 – 809

Abstract

Read online

For the development of neuro-steered hearing aids, it is important to study the relationship between a speech stimulus and the elicited EEG response of a human listener. The recent Auditory EEG Decoding Challenge 2023 (Signal Processing Grand Challenge, IEEE International Conference on Acoustics, Speech and Signal Processing) dealt with this relationship in the context of a match-mismatch classification task. The challenge's task was to find the speech stimulus that elicited a specific EEG response from two given speech stimuli. Participating in the challenge, we adopted the challenge's baseline model and explored an attention encoder to replace the spatial convolution in the EEG processing pipeline, as well as additional sequence modeling methods based on RNN, LSTM, and GRU to preprocess the speech stimuli. We compared speech envelopes and mel-spectrograms as two different types of input speech stimulus and evaluated our models on a test set as well as held-out stories and held-out subjects benchmark sets. In this work, we show that the mel-spectrograms generally offer better results. Replacing the spatial convolution with an attention encoder helps to capture better spatial and temporal information in the EEG response. Additionally, the sequence modeling methods can further enhance the performance, when mel-spectrograms are used. Consequently, both lead to higher performances on the test set and held-out stories benchmark set. Our best model outperforms the baseline by 1.91% on the test set and 1.35% on the total ranking score. We ranked second in the challenge.

Published in IEEE Open Journal of Signal Processing

ISSN: 2644-1322 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782710

About the journal

Abstract

Keywords