Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

Genshun Wan; Tingzhi Mao; Jingxuan Zhang; Hang Chen; Jianqing Gao; Zhongfu Ye

doi:10.3390/app13074243

Applied Sciences (Mar 2023)

Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

Genshun Wan,
Tingzhi Mao,
Jingxuan Zhang,
Hang Chen,
Jianqing Gao,
Zhongfu Ye

Affiliations

Genshun Wan: National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei 230088, China
Tingzhi Mao: iFLYTEK Research, iFLYTEK Co., Ltd., Hefei 230088, China
Jingxuan Zhang: iFLYTEK Research, iFLYTEK Co., Ltd., Hefei 230088, China
Hang Chen: National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei 230088, China
Jianqing Gao: iFLYTEK Research, iFLYTEK Co., Ltd., Hefei 230088, China
Zhongfu Ye: National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei 230088, China

DOI: https://doi.org/10.3390/app13074243
Journal volume & issue: Vol. 13, no. 7
p. 4243

Abstract

Read online

For most automatic speech recognition systems, many unacceptable hypothesis errors still make the recognition results absurd and difficult to understand. In this paper, we introduce the grammar information to improve the performance of the grammatical deviation distance and increase the readability of the hypothesis. The reinforcement of word embedding with grammar embedding is presented to intensify the grammar expression. An auxiliary text-to-grammar task is provided to improve the performance of the recognition results with the downstream task evaluation. Furthermore, the multiple evaluation methodology of grammar is used to explore an expandable usage paradigm with grammar knowledge. Experiments on the small open-source Mandarin speech corpus AISHELL-1 and large private-source Mandarin speech corpus TRANS-M tasks show that our method can perform very well with no additional data. Our method achieves relative character error rate reductions of 3.2% and 5.0%, a relative grammatical deviation distance reduction of 4.7% and 5.9% on AISHELL-1 and TRANS-M tasks, respectively. Moreover, the grammar-based mean opinion score of our method is about 4.29 and 3.20, significantly superior to the baseline of 4.11 and 3.02.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords