On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Mattheyses Wesley; Latacz Lukas; Verhelst Werner

EURASIP Journal on Audio, Speech, and Music Processing (Jan 2009)

On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Mattheyses Wesley,
Latacz Lukas,
Verhelst Werner

Affiliations

Mattheyses Wesley
Latacz Lukas
Verhelst Werner

Journal volume & issue: Vol. 2009, no. 1
p. 169819

Abstract

Read online

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal