Model-Based Synthesis of Visual Speech Movements from 3D Video

James D. Edge; Adrian Hilton; Philip Jackson

doi:10.1155/2009/597267

EURASIP Journal on Audio, Speech, and Music Processing (Jan 2009)

Model-Based Synthesis of Visual Speech Movements from 3D Video

James D. Edge,
Adrian Hilton,
Philip Jackson

Affiliations

James D. Edge
Adrian Hilton
Philip Jackson

DOI: https://doi.org/10.1155/2009/597267
Journal volume & issue: Vol. 2009

Abstract

Read online

We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets) with unit selection we improve the quality of our speech synthesis.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal