IEEE Open Journal of Signal Processing (Jan 2024)
Attention and Sequence Modeling for Match-Mismatch Classification of Speech Stimulus and EEG Response
Abstract
For the development of neuro-steered hearing aids, it is important to study the relationship between a speech stimulus and the elicited EEG response of a human listener. The recent Auditory EEG Decoding Challenge 2023 (Signal Processing Grand Challenge, IEEE International Conference on Acoustics, Speech and Signal Processing) dealt with this relationship in the context of a match-mismatch classification task. The challenge's task was to find the speech stimulus that elicited a specific EEG response from two given speech stimuli. Participating in the challenge, we adopted the challenge's baseline model and explored an attention encoder to replace the spatial convolution in the EEG processing pipeline, as well as additional sequence modeling methods based on RNN, LSTM, and GRU to preprocess the speech stimuli. We compared speech envelopes and mel-spectrograms as two different types of input speech stimulus and evaluated our models on a test set as well as held-out stories and held-out subjects benchmark sets. In this work, we show that the mel-spectrograms generally offer better results. Replacing the spatial convolution with an attention encoder helps to capture better spatial and temporal information in the EEG response. Additionally, the sequence modeling methods can further enhance the performance, when mel-spectrograms are used. Consequently, both lead to higher performances on the test set and held-out stories benchmark set. Our best model outperforms the baseline by 1.91% on the test set and 1.35% on the total ranking score. We ranked second in the challenge.
Keywords