Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Ifat Yasin; Vit Drga; Fangqi Liu; Andreas Demosthenous; Ray Meddis

doi:10.1109/ACCESS.2020.2981885

IEEE Access (Jan 2020)

Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants

Ifat Yasin,
Vit Drga,
Fangqi Liu,
Andreas Demosthenous,
Ray Meddis

Affiliations

Ifat Yasin: ORCiD; Department of Computer Science, University College London, London, U.K
Vit Drga: Department of Computer Science, University College London, London, U.K
Fangqi Liu: Department of Electronic and Electrical Engineering, University College London, London, U.K
Andreas Demosthenous: ORCiD; Department of Electronic and Electrical Engineering, University College London, London, U.K
Ray Meddis: Department of Psychology, University of Essex, Colchester, U.K

DOI: https://doi.org/10.1109/ACCESS.2020.2981885
Journal volume & issue: Vol. 8
pp. 56711 – 56719

Abstract

Read online

Physiological and psychophysical methods allow for an extended investigation of ascending (afferent) neural pathways from the ear to the brain in mammals, and their role in enhancing signals in noise. However, there is increased interest in descending (efferent) neural fibers in the mammalian auditory pathway. This efferent pathway operates via the olivocochlear system, modifying auditory processing by cochlear innervation and enhancing human ability to detect sounds in noisy backgrounds. Effective speech intelligibility may depend on a complex interaction between efferent time-constants and types of background noise. In this study, an auditory model with efferent-inspired processing provided the front-end to an automatic-speech-recognition system (ASR), used as a tool to evaluate speech recognition with changes in time-constants (50 to 2000 ms) and background noise type (unmodulated and modulated noise). With efferent activation, maximal speech recognition improvement (for both noise types) occurred for signal-to-noise ratios around 10 dB, characteristic of real-world speech-listening situations. Net speech improvement due to efferent activation (NSIEA) was smaller in modulated noise than in unmodulated noise. For unmodulated noise, NSIEA increased with increasing time-constant. For modulated noise, NSIEA increased for time-constants up to 200 ms but remained similar for longer time-constants, consistent with speech-envelope modulation times important to speech recognition in modulated noise. The model improves our understanding of the complex interactions involved in speech recognition in noise, and could be used to simulate the difficulties of speech perception in noise as a consequence of different types of hearing loss.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords