大数据 (Sep 2024)
A survey of emotional speech synthesis
Abstract
As a significant research area in the field of speech technology, speech synthesis is dedicated to converting text into speech. With the rapid development of deep learning technology, the objective of speech synthesis has evolved beyond merely producing "understandable" audio. The incorporation of emotion often enhances the expressiveness of synthesized speech. Consequently, emotional speech synthesis aims to combine speech with different emotions and regulate these emotions to generate flexible and precise emotional speech. Starting from several key issues in emotional speech synthesis, this paper summarizes and analyzes the development based on emotion transfer, emotion intensity control and emotion mixing in recent years, and introduces the relevant data sets and evaluation indicators of emotion speech synthesis. Finally, the emotional speech synthesis is prospected.