GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

Natalia A. Tomashenko; Yuri Yu. Khokhlov; Anthony Larcher; Yannick Estève; Yuri N. Matveev

doi:10.17586/2226-1494-2016-16-6-1063-1072

Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Nov 2016)

GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

Natalia A. Tomashenko,
Yuri Yu. Khokhlov,
Anthony Larcher,
Yannick Estève ,
Yuri N. Matveev

Affiliations

Natalia A. Tomashenko: postgraduate, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, France; researcher, “STC-Innovation”, Ltd., Saint Petersburg, 196084, Russian Federation; postgraduate, ITMO University, Saint Petersburg, 197101, Russian Federation
Yuri Yu. Khokhlov: – leading programmer, "STC-Innovations", Ltd., Saint Petersburg, 196084, Russian Federation
Anthony Larcher: PhD, Associate professor, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, France
Yannick Estève: D.Sc., Professor, Director, Laboratory of Computer Science of the University of Le Mans (LIUM), Le Mans, 72085, France
Yuri N. Matveev: D.Sc., Chief scientific researcher, “STCInnovation”, Ltd., Saint Petersburg, 196084, Russian Federation; Head of Chair, ITMO University, Saint Petersburg, 197101, Russian Federation

DOI: https://doi.org/10.17586/2226-1494-2016-16-6-1063-1072
Journal volume & issue: Vol. 16, no. 6
pp. 1063 – 1072

Abstract

Read online

Subject of Research. We study speaker adaptation of deep neural network (DNN) acoustic models in automatic speech recognition systems. The aim of speaker adaptation techniques is to improve the accuracy of the speech recognition system for a particular speaker. Method. A novel method for training and adaptation of deep neural network acoustic models has been developed. It is based on using an auxiliary GMM (Gaussian Mixture Models) model and GMMD (GMM-derived) features. The principle advantage of the proposed GMMD features is the possibility of performing the adaptation of a DNN through the adaptation of the auxiliary GMM. In the proposed approach any methods for the adaptation of the auxiliary GMM can be used, hence, it provides a universal method for transferring adaptation algorithms developed for GMMs to DNN adaptation.Main Results. The effectiveness of the proposed approach was shown by means of one of the most common adaptation algorithms for GMM models – MAP (Maximum A Posteriori) adaptation. Different ways of integration of the proposed approach into state-of-the-art DNN architecture have been proposed and explored. Analysis of choosing the type of the auxiliary GMM model is given. Experimental results on the TED-LIUM corpus demonstrate that, in an unsupervised adaptation mode, the proposed adaptation technique can provide, approximately, a 11–18% relative word error reduction (WER) on different adaptation sets, compared to the speaker-independent DNN system built on conventional features, and a 3–6% relative WER reduction compared to the SAT-DNN trained on fMLLR adapted features.

Published in Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki

ISSN: 2226-1494 (Print); 2500-0373 (Online)
Publisher: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
Country of publisher: Russian Federation
LCC subjects: Science: Physics: Optics. Light; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ntv.ifmo.ru/en/english.htm

About the journal

Abstract

Keywords