IEEE Access (Jan 2024)

Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset

  • Mujahid Jamal A. Khalifah,
  • Michal Ptaszynski,
  • Fumito Masui

DOI
https://doi.org/10.1109/ACCESS.2024.3495694
Journal volume & issue
Vol. 12
pp. 167724 – 167777

Abstract

Read online

This study explores the feasibility of using artificial emotional speech datasets generated by existing artificial voice-generating software as an alternative to human-generated datasets for emotional speech synthesis. Focusing on the Japanese language, we assess the viability of these artificial datasets in languages with limited emotional speech resources. Our approach combines qualitative and quantitative analyses to evaluate the effectiveness of synthetic emotional speech in replicating human-like emotional expression. The results demonstrate that while artificial datasets can approximate certain emotional states, there are significant limitations in replicating the full range of human emotions, particularly in subtle or mixed emotions. These findings underscore the potential and current constraints of using artificial datasets in emotional speech synthesis, suggesting avenues for future research to enhance the quality and emotional expressiveness of synthetic speech.

Keywords