Towards end-to-end speech recognition with transfer learning

Chu-Xiong Qin; Dan Qu; Lian-Hai Zhang

doi:10.1186/s13636-018-0141-9

EURASIP Journal on Audio, Speech, and Music Processing (Nov 2018)

Towards end-to-end speech recognition with transfer learning

Chu-Xiong Qin,
Dan Qu,
Lian-Hai Zhang

Affiliations

Chu-Xiong Qin: National Digital Switching System Engineering and Technological R&D Center
Dan Qu: National Digital Switching System Engineering and Technological R&D Center
Lian-Hai Zhang: National Digital Switching System Engineering and Technological R&D Center

DOI: https://doi.org/10.1186/s13636-018-0141-9
Journal volume & issue: Vol. 2018, no. 1
pp. 1 – 9

Abstract

Read online

Abstract A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training with matrix factorization algorithm is introduced to extract high-level features. Secondly, the advantage of connectionist temporal classification (CTC) is transferred to the target attention-based model through a joint CTC-attention model composed of shallow recurrent neural networks (RNNs) on top of the proposed features. The experimental results show that the proposed transfer learning approach achieved the best performance among all end-to-end methods and could be comparable to the state-of-the-art speech recognition system for TIMIT when further jointly decoded with a RNN language model.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords