PLoS Biology (Feb 2024)

Original speech and its echo are segregated and separately processed in the human brain.

  • Jiaxin Gao,
  • Honghua Chen,
  • Mingxuan Fang,
  • Nai Ding

DOI
https://doi.org/10.1371/journal.pbio.3002498
Journal volume & issue
Vol. 22, no. 2
p. e3002498

Abstract

Read online

Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.