Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Sadaoki Furui; Koji Iwano; Arnar Thor Jensson

doi:10.1155/2008/573832

EURASIP Journal on Audio, Speech, and Music Processing (Jan 2009)

Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Sadaoki Furui,
Koji Iwano,
Arnar Thor Jensson

Affiliations

Sadaoki Furui
Koji Iwano
Arnar Thor Jensson

DOI: https://doi.org/10.1155/2008/573832
Journal volume & issue: Vol. 2008

Abstract

Read online

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, machine translated (MT) from English to Icelandic on a word-by-word and sentence-by-sentence basis. LM interpolation using the baseline LM and an LM built from either word-by-word or sentence-by-sentence translated text reduced the word error rate significantly when manually obtained utterances used as a baseline were very sparse.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal