PLoS ONE (Jan 2020)
Speaking to a common tune: Between-speaker convergence in voice fundamental frequency in a joint speech production task.
Abstract
Recent research on speech communication has revealed a tendency for speakers to imitate at least some of the characteristics of their interlocutor's speech sound shape. This phenomenon, referred to as phonetic convergence, entails a moment-to-moment adaptation of the speaker's speech targets to the perceived interlocutor's speech. It is thought to contribute to setting up a conversational common ground between speakers and to facilitate mutual understanding. However, it remains uncertain to what extent phonetic convergence occurs in voice fundamental frequency (F0), in spite of the major role played by pitch, F0's perceptual correlate, as a conveyor of both linguistic information and communicative cues associated with the speaker's social/individual identity and emotional state. In the present work, we investigated to what extent two speakers converge towards each other with respect to variations in F0 in a scripted dialogue. Pairs of speakers jointly performed a speech production task, in which they were asked to alternately read aloud a written story divided into a sequence of short reading turns. We devised an experimental set-up that allowed us to manipulate the speakers' F0 in real time across turns. We found that speakers tended to imitate each other's changes in F0 across turns that were both limited in amplitude and spread over large temporal intervals. This shows that, at the perceptual level, speakers monitor slow-varying movements in their partner's F0 with high accuracy and, at the production level, that speakers exert a very fine-tuned control on their laryngeal vibrator in order to imitate these F0 variations. Remarkably, F0 convergence across turns was found to occur in spite of the large melodic variations typically associated with reading turns. Our study sheds new light on speakers' perceptual tracking of F0 in speech processing, and the impact of this perceptual tracking on speech production.