Journal of Electrical and Electronics Engineering (Oct 2015)

Modeling of Slovak Language for Broadcast News Transcription

  • STAŠ Ján,
  • JUHÁR Jozef

Journal volume & issue
Vol. 8, no. 2
pp. 43 – 46

Abstract

Read online

The paper describes recent progress in the development the Slovak language models for transcription of spontaneous speech such as broadcast news, educational talks and lectures, or meetings. This work extends previous research oriented on the automatic transcription of dictated speech and brings some new extensions for improving perplexity and robustness of the Slovak language models trained on the web-based and electronic language resources for being more precise in recognition of spontaneous speech. These improvements include better text preprocessing, document classification, class-based and filled pauses modeling, web-data augmentation and fast model adaptation to the target domain. Experiments have been performed on the four different evaluation data sets, including judicial and newspaper readings, broadcast news recordings and parliament proceedings with the Slovak transcription system. Preliminary results show significant decrease of the word error rate for multiple transcription system configurations of acoustic and language models.

Keywords