A survey of emotional speech synthesis

SHI Haoxiang; ZHANG Xulong; WANG Jianzong; CHENG Ning; XIAO Jing

大数据 (Sep 2024)

A survey of emotional speech synthesis

SHI Haoxiang,
ZHANG Xulong,
WANG Jianzong,
CHENG Ning,
XIAO Jing

Affiliations

SHI Haoxiang
ZHANG Xulong
WANG Jianzong
CHENG Ning
XIAO Jing

Journal volume & issue: Vol. 10
pp. 56 – 73

Abstract

Read online

As a significant research area in the field of speech technology, speech synthesis is dedicated to converting text into speech. With the rapid development of deep learning technology, the objective of speech synthesis has evolved beyond merely producing "understandable" audio. The incorporation of emotion often enhances the expressiveness of synthesized speech. Consequently, emotional speech synthesis aims to combine speech with different emotions and regulate these emotions to generate flexible and precise emotional speech. Starting from several key issues in emotional speech synthesis, this paper summarizes and analyzes the development based on emotion transfer, emotion intensity control and emotion mixing in recent years, and introduces the relevant data sets and evaluation indicators of emotion speech synthesis. Finally, the emotional speech synthesis is prospected.

emotional speech synthesis;emotion transfer;emotion intensity;deep learning

Published in 大数据

ISSN: 2096-0271 (Print)
Publisher: China InfoCom Media Group
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.infocomm-journal.com/bdr/EN/2096-0271/home.shtml

About the journal

Abstract

Keywords