The integration of continuous audio and visual speech in a cocktail-party environment depends on attention

Farhin Ahmed; Aaron R. Nidiffer; Aisling E. O'Sullivan; Nathaniel J. Zuk; Edmund C. Lalor

NeuroImage (Jul 2023)

The integration of continuous audio and visual speech in a cocktail-party environment depends on attention

Farhin Ahmed,
Aaron R. Nidiffer,
Aisling E. O'Sullivan,
Nathaniel J. Zuk,
Edmund C. Lalor

Affiliations

Farhin Ahmed: Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
Aaron R. Nidiffer: Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
Aisling E. O'Sullivan: Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
Nathaniel J. Zuk: Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
Edmund C. Lalor: Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland; Corresponding author.

Journal volume & issue: Vol. 274
p. 120143

Abstract

Read online

In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A + V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A + V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.

Published in NeuroImage

ISSN: 1053-8119 (Print); 1095-9572 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.journals.elsevier.com/neuroimage

About the journal

Abstract

Keywords