Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Alakbar Valizada; Sevil Jafarova; Emin Sultanov; Samir Rustamov

doi:10.3390/sym13050819

Symmetry (May 2021)

Development and Evaluation of Speech Synthesis System Based on Deep Learning Models

Alakbar Valizada,
Sevil Jafarova,
Emin Sultanov,
Samir Rustamov

Affiliations

Alakbar Valizada: Artificial Intelligence Laboratory, ATL Tech, Jalil Mammadguluzadeh 102A, Baku 1022, Azerbaijan
Sevil Jafarova: Artificial Intelligence Laboratory, ATL Tech, Jalil Mammadguluzadeh 102A, Baku 1022, Azerbaijan
Emin Sultanov: Artificial Intelligence Laboratory, ATL Tech, Jalil Mammadguluzadeh 102A, Baku 1022, Azerbaijan
Samir Rustamov: School of Information Technologies and Engineering, ADA University, Ahmadbey Aghaoglu Str. 11, Baku 1008, Azerbaijan

DOI: https://doi.org/10.3390/sym13050819
Journal volume & issue: Vol. 13, no. 5
p. 819

Abstract

Read online

This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords