SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants

Bernd Accou; Lies Bollens; Marlies Gillis; Wendy Verheijen; Hugo Van hamme; Tom Francart

doi:10.3390/data9080094

Data (Jul 2024)

SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants

Bernd Accou,
Lies Bollens,
Marlies Gillis,
Wendy Verheijen,
Hugo Van hamme,
Tom Francart

Affiliations

Bernd Accou: Experimental Oto-Rhino-Laryngology (ExpORL), Department Neurosciences, KU Leuven, B-3001 Leuven, Belgium
Lies Bollens: Experimental Oto-Rhino-Laryngology (ExpORL), Department Neurosciences, KU Leuven, B-3001 Leuven, Belgium
Marlies Gillis: Experimental Oto-Rhino-Laryngology (ExpORL), Department Neurosciences, KU Leuven, B-3001 Leuven, Belgium
Wendy Verheijen: Experimental Oto-Rhino-Laryngology (ExpORL), Department Neurosciences, KU Leuven, B-3001 Leuven, Belgium
Hugo Van hamme: Processing Speech and Images (PSI), Department of Electrical Engineering (ESAT), KU Leuven, B-3001 Leuven, Belgium
Tom Francart: Experimental Oto-Rhino-Laryngology (ExpORL), Department Neurosciences, KU Leuven, B-3001 Leuven, Belgium

DOI: https://doi.org/10.3390/data9080094
Journal volume & issue: Vol. 9, no. 8
p. 94

Abstract

Read online

Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.

Published in Data

ISSN: 2306-5729 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Bibliography. Library science. Information resources
Website: http://www.mdpi.com/journal/data

About the journal

Abstract

Keywords