Advances in Electrical and Computer Engineering (Feb 2015)

Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique

  • CUCU, H.,
  • BUZO, A.,
  • BESACIER, L.,
  • BURILEANU, C.

DOI
https://doi.org/10.4316/AECE.2015.01009
Journal volume & issue
Vol. 15, no. 1
pp. 63 – 68

Abstract

Read online

Statistical speech and language processing techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition. For high-resourced, international languages this data is widely available, while for under-resourced languages the lack of data poses serious problems. Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic model for any under-resourced language. This study describes a novel unsupervised acoustic model training method and evaluates it on speech data in an under-resourced language: Romanian. The key novel factor of the method is the usage of two complementary seed ASR systems to produce high quality transcriptions, with a Character Error Rate (ChER) < 5%, for initially untranscribed speech data. The methodology leads to a relative Word Error Rate (WER) improvement of more than 10% when 100 hours of untranscribed speech are used.

Keywords