On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling

H&#228;m&#228;l&#228;inen Annika; Boves Lou; de Veth Johan; ten Bosch Louis

EURASIP Journal on Audio, Speech, and Music Processing (Jan 2007)

On the Utility of Syllable-Based Acoustic Models for Pronunciation Variation Modelling

Hämäläinen Annika,
Boves Lou,
de Veth Johan,
ten Bosch Louis

Affiliations

Hämäläinen Annika
Boves Lou
de Veth Johan
ten Bosch Louis

Journal volume & issue: Vol. 2007, no. 1
p. 046460

Abstract

Read online

Recent research on the TIMIT corpus suggests that longer-length acoustic models are more appropriate for pronunciation variation modelling than the context-dependent phones that conventional automatic speech recognisers use. However, the impressive speech recognition results obtained with longer-length models on TIMIT remain to be reproduced on other corpora. To understand the conditions in which longer-length acoustic models result in considerable improvements in recognition performance, we carry out recognition experiments on both TIMIT and the Spoken Dutch Corpus and analyse the differences between the two sets of results. We establish that the details of the procedure used for initialising the longer-length models have a substantial effect on the speech recognition results. When initialised appropriately, longer-length acoustic models that borrow their topology from a sequence of triphones cannot capture the pronunciation variation phenomena that hinder recognition performance the most.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal