Tibetan speech synthesis based on an improved neural network

Ding Yuntao; Cai Rangzhuoma; Gong Baojia

doi:10.1051/matecconf/202133606012

MATEC Web of Conferences (Jan 2021)

Tibetan speech synthesis based on an improved neural network

Ding Yuntao,
Cai Rangzhuoma,
Gong Baojia

Affiliations

Ding Yuntao
Cai Rangzhuoma
Gong Baojia

DOI: https://doi.org/10.1051/matecconf/202133606012
Journal volume & issue: Vol. 336
p. 06012

Abstract

Read online

Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.

Published in MATEC Web of Conferences

ISSN: 2261-236X (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.matec-conferences.org

About the journal