NeuroImage (May 2023)
Cross-modal interactions at the audiovisual cocktail-party revealed by behavior, ERPs, and neural oscillations
Abstract
Theories of attention argue that objects are the units of attentional selection. In real-word environments such objects can contain visual and auditory features. To understand how mechanisms of selective attention operate in multisensory environments, in this pre-registered study, we created an audiovisual cocktail-party situation, in which two speakers (left and right of fixation) simultaneously articulated brief numerals. In three separate blocks, informative auditory speech was presented (a) alone or paired with (b) congruent or (c) uninformative visual speech. In all blocks, subjects localized a pre-defined numeral. While audiovisual-congruent and uninformative speech improved response times and speed of information uptake according to diffusion modeling, an ERP analysis revealed that this did not coincide with enhanced attentional engagement. Yet, consistent with object-based attentional selection, the deployment of auditory spatial attention (N2ac) was accompanied by visuo-spatial attentional orienting (N2pc) irrespective of the informational content of visual speech. Notably, an N2pc component was absent in the auditory-only condition, demonstrating that a sound-induced shift of visuo-spatial attention relies on the availability of audio-visual features evolving coherently in time. Additional exploratory analyses revealed cross-modal interactions in working memory and modulations of cognitive control.