A representation of abstract linguistic categories in the visual system underlies successful lipreading

Aaron R Nidiffer; Cody Zhewei Cao; Aisling O'Sullivan; Edmund C Lalor

NeuroImage (Nov 2023)

A representation of abstract linguistic categories in the visual system underlies successful lipreading

Aaron R Nidiffer,
Cody Zhewei Cao,
Aisling O'Sullivan,
Edmund C Lalor

Affiliations

Aaron R Nidiffer: Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
Cody Zhewei Cao: Department of Psychology, University of Michigan, Ann Arbor, MI, USA
Aisling O'Sullivan: School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
Edmund C Lalor: Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland; Corresponding author at: Department of Biomedical Engineering, University of Rochester, 204 Robert B. Goergen Hall, P.O. Box 270168, Rochester, NY 14627, USA.

Journal volume & issue: Vol. 282
p. 120391

Abstract

Read online

There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG – beyond the motion and lip movements – was significantly enhanced for rehearsed videos in a way that correlated with participants’ trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.

Published in NeuroImage

ISSN: 1053-8119 (Print); 1095-9572 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.journals.elsevier.com/neuroimage

About the journal

Abstract

Keywords