Advances in Electrical and Computer Engineering (Feb 2017)

Comparison of Cepstral Normalization Techniques in Whispered Speech Recognition

  • GROZDIC, D.,
  • JOVICIC, S.,
  • SUMARAC PAVLOVIC, D.,
  • GALIC, J.,
  • MARKOVIC, B.

DOI
https://doi.org/10.4316/AECE.2017.01004
Journal volume & issue
Vol. 17, no. 1
pp. 21 – 26

Abstract

Read online

This article presents an analysis of different cepstral normalization techniques in automatic recognition of whispered and bimodal speech (speech+whisper). In these experiments, conventional GMM-HMM speech recognizer was used as speaker-dependant automatic speech recognition system with special Whi-Spe corpus containing utterance recordings in normally phonated speech and whisper. The following normalization techniques were tested and compared: CMN (Cepstral Mean Normalization), CVN (Cepstral Variance Normalization), MVN (Cepstral Mean and Variance Normalization), CGN (Cepstral Gain Normalization) and quantile-based dynamic normalization techniques such as QCN and QCN-RASTA. The experimental results show to what extent each of these cepstral normalization techniques can improve whisper recognition accuracy in mismatched train/test scenario. The best result is obtained using CMN in combination with inverse filtering which provides an average 39.9 percent improvement in whisper recognition accuracy for all tested speakers.

Keywords