Multisensory benefits for speech recognition in noisy environments

Yonghee Oh; Yonghee Oh; Meg Schwalm; Nicole Kalpin

doi:10.3389/fnins.2022.1031424

Frontiers in Neuroscience (Oct 2022)

Multisensory benefits for speech recognition in noisy environments

Yonghee Oh,
Yonghee Oh,
Meg Schwalm,
Nicole Kalpin

Affiliations

Yonghee Oh: Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States
Yonghee Oh: Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
Meg Schwalm: Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
Nicole Kalpin: Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States

DOI: https://doi.org/10.3389/fnins.2022.1031424
Journal volume & issue: Vol. 16

Abstract

Read online

A series of our previous studies explored the use of an abstract visual representation of the amplitude envelope cues from target sentences to benefit speech perception in complex listening environments. The purpose of this study was to expand this auditory-visual speech perception to the tactile domain. Twenty adults participated in speech recognition measurements in four different sensory modalities (AO, auditory-only; AV, auditory-visual; AT, auditory-tactile; AVT, auditory-visual-tactile). The target sentences were fixed at 65 dB sound pressure level and embedded within a simultaneous speech-shaped noise masker of varying degrees of signal-to-noise ratios (−7, −5, −3, −1, and 1 dB SNR). The amplitudes of both abstract visual and vibrotactile stimuli were temporally synchronized with the target speech envelope for comparison. Average results showed that adding temporally-synchronized multimodal cues to the auditory signal did provide significant improvements in word recognition performance across all three multimodal stimulus conditions (AV, AT, and AVT), especially at the lower SNR levels of −7, −5, and −3 dB for both male (8–20% improvement) and female (5–25% improvement) talkers. The greatest improvement in word recognition performance (15–19% improvement for males and 14–25% improvement for females) was observed when both visual and tactile cues were integrated (AVT). Another interesting finding in this study is that temporally synchronized abstract visual and vibrotactile stimuli additively stack in their influence on speech recognition performance. Our findings suggest that a multisensory integration process in speech perception requires salient temporal cues to enhance speech recognition ability in noisy environments.

Published in Frontiers in Neuroscience

ISSN: 1662-4548 (Print); 1662-453X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/neuroscience

About the journal

Abstract

Keywords