Journal of Systemics, Cybernetics and Informatics (Dec 2012)

Coding Methods for the NMF Approach to Speech Recognition and Vocabulary Acquisition

  • Meng Sun,
  • Hugo Van Hamme

Journal volume & issue
Vol. 10, no. 6
pp. 94 – 99

Abstract

Read online

This paper aims at improving the accuracy of the non- negative matrix factorization approach to word learn- ing and recognition of spoken utterances. We pro- pose and compare three coding methods to alleviate quantization errors involved in the vector quantization (VQ) of speech spectra: multi-codebooks, soft VQ and adaptive VQ. We evaluate on the task of spotting a vocabulary of 50 keywords in continuous speech. The error rates of multi-codebooks decreased with increas- ing number of codebooks, but the accuracy leveled off around 5 to 10 codebooks. Soft VQ and adaptive VQ made a better trade-off between the required memory and the accuracy. The best of the proposed methods reduce the error rate to 1.2% from the 1.9% obtained with a single codebook. The coding methods and the model framework may also prove useful for applica- tions such as topic discovery/detection and mining of sequential patterns.

Keywords