Self-Supervised Learning of Neural Speech Representations From Unlabeled Intracranial Signals

Srdjan Lesaja; Morgan Stuart; Jerry J. Shih; Pedram Z. Soroush; Tanja Schultz; Milos Manic; Dean J. Krusienski

doi:10.1109/ACCESS.2022.3230688

IEEE Access (Jan 2022)

Self-Supervised Learning of Neural Speech Representations From Unlabeled Intracranial Signals

Srdjan Lesaja,
Morgan Stuart,
Jerry J. Shih,
Pedram Z. Soroush,
Tanja Schultz,
Milos Manic,
Dean J. Krusienski

Affiliations

Srdjan Lesaja: ORCiD; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA, USA
Morgan Stuart: ORCiD; Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Jerry J. Shih: ORCiD; Neurology Department, UCSD Health, San Diego, CA, USA
Pedram Z. Soroush: ORCiD; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA, USA
Tanja Schultz: ORCiD; Cognitive Systems Laboratory, University of Bremen, Bremen, Germany
Milos Manic: ORCiD; Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Dean J. Krusienski: ORCiD; Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA, USA

DOI: https://doi.org/10.1109/ACCESS.2022.3230688
Journal volume & issue: Vol. 10
pp. 133526 – 133538

Abstract

Read online

Neuroprosthetics have demonstrated the potential to decode speech from intracranial brain signals, and hold promise for one day returning the ability to speak to those who have lost it. However, data in this domain is scarce, highly variable, and costly to label for supervised modeling. In order to address these constraints, we present brain2vec, a transformer-based approach for learning feature representations from intracranial electroencephalogram data. Brain2vec combines a self-supervised learning methodology, neuroanatomical positional embeddings, and the contextual representations of transformers to achieve three novelties: (1) learning from unlabeled intracranial brain signals, (2) learning from multiple participants simultaneously, all while (3) utilizing only raw unprocessed data. To assess our approach, we use a leave-one-participant-out validation procedure to separate brain2vec’s feature learning from the holdout participant’s speech-related supervised classification tasks. With only two linear layers, we achieve 90% accuracy on a canonical speech detection task, 42% accuracy on a more challenging 4-class speech-related behavior recognition, and 53% accuracy when applied to a 10-class, few-shot word classification task. Combined with the visualizations of unsupervised class separation in the learned features, our results evidence brain2vec’s ability to learn highly generalized representations of neural activity without the need for labels or consistent sensor location.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords