Projected Minimal Gated Recurrent Unit for Speech Recognition

Renjian Feng; Weijie Jiang; Ning Yu; Yinfeng Wu; Jiaxuan Yan

doi:10.1109/ACCESS.2020.3041477

IEEE Access (Jan 2020)

Projected Minimal Gated Recurrent Unit for Speech Recognition

Renjian Feng,
Weijie Jiang,
Ning Yu,
Yinfeng Wu,
Jiaxuan Yan

Affiliations

Renjian Feng: ORCiD; School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
Weijie Jiang: ORCiD; School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
Ning Yu: ORCiD; School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
Yinfeng Wu: ORCiD; School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China
Jiaxuan Yan: ORCiD; School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2020.3041477
Journal volume & issue: Vol. 8
pp. 215192 – 215201

Abstract

Read online

Recurrent neural network (RNN) has the ability to learn long-term dependencies, which makes it suitable for acoustic modeling in speech recognition. In this paper, we revise RNN model used in acoustic modeling, namely, mGRUIP with Context module (mGRUIP-Ctx), and propose an advanced model which named Projected minimal Gated Recurrent Unit (PmGRU). The paper demonstrates two major contributions: firstly, in the case that adding context information to context module in mGRUIP-Ctx will bring about large amount of parameter, we propose to insert a smaller output projection layer after the mGRUIP-Ctx cell's output to form the PmGRU, which is inspired by the idea of low-rank decomposition of matrix. The output projection layer has been proved to be able to save most of the effective information with the reduction of model parameters. Secondly, in the case that too much context information of previous layer introduced by context module will cause declining of model performance, we adjust the ratio of context information of the previous layer to the current layer by moving the position of batch normalization layer, and the final RNN model Normalization Projected minimal Gated Recurrent Unit (Norm-PmGRU) is generated. In the five automatic speech recognition (ASR) tasks, the Norm-PmGRU has been proved more effectively in the experiments compared with mGRUIP-Ctx, TDNN-OPGRU, TDNN-LSTMP and other RNN baseline acoustics models.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords