Journal of Otorhinolaryngology, Hearing and Balance Medicine (Apr 2024)
The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions
Abstract
Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging.
Keywords