MATEC Web of Conferences (Jan 2021)

Tibetan speech synthesis based on an improved neural network

  • Ding Yuntao,
  • Cai Rangzhuoma,
  • Gong Baojia

DOI
https://doi.org/10.1051/matecconf/202133606012
Journal volume & issue
Vol. 336
p. 06012

Abstract

Read online

Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.