IEEE Access (Jan 2024)
Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset
Abstract
This study explores the feasibility of using artificial emotional speech datasets generated by existing artificial voice-generating software as an alternative to human-generated datasets for emotional speech synthesis. Focusing on the Japanese language, we assess the viability of these artificial datasets in languages with limited emotional speech resources. Our approach combines qualitative and quantitative analyses to evaluate the effectiveness of synthetic emotional speech in replicating human-like emotional expression. The results demonstrate that while artificial datasets can approximate certain emotional states, there are significant limitations in replicating the full range of human emotions, particularly in subtle or mixed emotions. These findings underscore the potential and current constraints of using artificial datasets in emotional speech synthesis, suggesting avenues for future research to enhance the quality and emotional expressiveness of synthetic speech.
Keywords