ETRI Journal (Feb 2019)

Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model

  • Hoon Chung,
  • Jeon Gue Park,
  • Ho‐Young Jung

DOI
https://doi.org/10.4218/etrij.2018-0189
Journal volume & issue
Vol. 41, no. 2
pp. 235 – 241

Abstract

Read online

In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network (FFDNN)‐based acoustic model. In the FFDNN‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.

Keywords