Journal of Systemics, Cybernetics and Informatics (Dec 2012)
Coding Methods for the NMF Approach to Speech Recognition and Vocabulary Acquisition
Abstract
This paper aims at improving the accuracy of the non- negative matrix factorization approach to word learn- ing and recognition of spoken utterances. We pro- pose and compare three coding methods to alleviate quantization errors involved in the vector quantization (VQ) of speech spectra: multi-codebooks, soft VQ and adaptive VQ. We evaluate on the task of spotting a vocabulary of 50 keywords in continuous speech. The error rates of multi-codebooks decreased with increas- ing number of codebooks, but the accuracy leveled off around 5 to 10 codebooks. Soft VQ and adaptive VQ made a better trade-off between the required memory and the accuracy. The best of the proposed methods reduce the error rate to 1.2% from the 1.9% obtained with a single codebook. The coding methods and the model framework may also prove useful for applica- tions such as topic discovery/detection and mining of sequential patterns.