End-to-End Speech Synthesis Based on BERT

AN Xin, DAI Zi-biao, LI Yang, SUN Xiao, REN Fu-ji

doi:10.11896/jsjkx.210300071

Jisuanji kexue (Apr 2022)

End-to-End Speech Synthesis Based on BERT

AN Xin, DAI Zi-biao, LI Yang, SUN Xiao, REN Fu-ji

Affiliations

AN Xin, DAI Zi-biao, LI Yang, SUN Xiao, REN Fu-ji: School of Computer, Information, Hefei University of Technology, Hefei230601, China;Anhui Province Key Laboratory of Affective Computing, Advanced Intelligent Machine, Hefei University of Technology, Hefei230601, China

DOI: https://doi.org/10.11896/jsjkx.210300071
Journal volume & issue: Vol. 49, no. 4
pp. 221 – 226

Abstract

Read online

To address the problems of low training and prediction efficiency of RNN-based neural network speech synthesis mo-dels and long-distance information loss, an end-to-end BERT-based speech synthesis method is proposed to use the Self-Attention Mechanism instead of RNN as an encoder in the Seq2Seq architecture of speech synthesis.The method uses a pre-trained BERT as the model's Encoder to extract contextual information from the input text content, the Decoder outputs the Mel spectrum by using the same architecture as the speech synthesis model Tacotron2, and finally the trained WaveGlow network is used to transform the Mel spectrum into the final audio result.This method significantly reduces the training parameters and training time by fine-tuning the downstream task based on pre-trained BERT.At the same time, it can also compute the hidden states in the encoder in parallel with its Self-Attention mechanism, thus making full use of the parallel computing power of the GPU to improve the training efficiency and effectively alleviate the remote dependency problem.Through comparison experiments with the Tacotron2 model, the results show that the model proposed in this paper is able to double the training speed while obtaining similar results to the Tacotron2 model.

speech synthesis|recurrent neural network(rnn)|seq2seq|waveglow|attention mechanism

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords