Research Status and Prospect of Transformer in Speech Recognition

ZHANG Xiaoxu, MA Zhiqiang, LIU Zhiqiang, ZHU Fangyuan, WANG Chunyu

doi:10.3778/j.issn.1673-9418.2103020

Jisuanji kexue yu tansuo (Sep 2021)

Research Status and Prospect of Transformer in Speech Recognition

ZHANG Xiaoxu, MA Zhiqiang, LIU Zhiqiang, ZHU Fangyuan, WANG Chunyu

Affiliations

ZHANG Xiaoxu, MA Zhiqiang, LIU Zhiqiang, ZHU Fangyuan, WANG Chunyu: 1. College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China 2. Inner Mongolia Autonomous Region Engineering & Technology Research Centre of Big Data Based Software Service, Hohhot 010080, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2103020
Journal volume & issue: Vol. 15, no. 9
pp. 1578 – 1594

Abstract

Read online

As a new deep learning algorithm framework, Transformer has attracted more and more researchers?? attention and has become a current research hotspot. Inspired by humans focusing on important things only, the self-attention mechanism in the Transformer model mainly learns important information in the input sequence. For speech recogni-tion tasks, the focus is to transcribe the information of the input speech sequence into the corresponding language text. The past practice was to combine acoustic models, pronunciation dictionaries, and language models into a speech recognition system to achieve speech recognition tasks, while Transformer can integrate them into a single neural network to form an end-to-end speech recognition system, which solves the issues such as forced alignment and multi-module training of the traditional speech recognition system. Therefore, it is very necessary to discuss the problems of Transformer in speech recognition tasks. In this paper, the structure of the Transformer model is first introduced. Besides, the problems confronted by speech recognition are analyzed with respect to input speech sequence, deep model architecture, and model inference. Then the methods to solve the obstacles within the three aspects afore mentioned are outlined and summarized. Finally, the future application and direction of Transformer in speech recognition are concluded and prospected.

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords