MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Kostadin Mishev; Aleksandra Karovska Ristovska; Dimitar Trajanov; Tome Eftimov; Monika Simjanoska

doi:10.3390/app10196882

Applied Sciences (Oct 2020)

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Kostadin Mishev,
Aleksandra Karovska Ristovska,
Dimitar Trajanov,
Tome Eftimov,
Monika Simjanoska

Affiliations

Kostadin Mishev: Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia
Aleksandra Karovska Ristovska: Faculty of Philosophy, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia
Dimitar Trajanov: Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia
Tome Eftimov: Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia
Monika Simjanoska: Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia

DOI: https://doi.org/10.3390/app10196882
Journal volume & issue: Vol. 10, no. 19
p. 6882

Abstract

Read online

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords