End-to-End Mandarin Recognition based on Convolution Input

Wang Yanzhe; Zhang LiMin; Zhang Bingqiang; Li Zhenyu

doi:10.1051/matecconf/201821401004

MATEC Web of Conferences (Jan 2018)

End-to-End Mandarin Recognition based on Convolution Input

Wang Yanzhe,
Zhang LiMin,
Zhang Bingqiang,
Li Zhenyu

Affiliations

Wang Yanzhe
Zhang LiMin
Zhang Bingqiang
Li Zhenyu

DOI: https://doi.org/10.1051/matecconf/201821401004
Journal volume & issue: Vol. 214
p. 01004

Abstract

Read online

The cross-entropy criterion of mainstream neural network training is to classify and optimize each frame of acoustic data, while the continuous speech recognition uses the sequence-level transcription accuracy as a performance measure. In view of this difference, an end-to-end speech recognition system based on sequence level transcription is constructed in this paper. The model uses convolution neural network to deal with the input features, selects the best network structure, and performs two-dimensional convolution in the time and frequency domains. At the same time, neural network uses batch normalization technology to reduce generalization error and speed up training. Finally, the hyper-parameters in decoding process are optimized to improve the modelling effect. Experimental results show that the system performance is improved a lot, better than mainstream speech recognition systems.

Published in MATEC Web of Conferences

ISSN: 2261-236X (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.matec-conferences.org

About the journal