Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning

PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin

doi:10.11896/jsjkx.200500148

Jisuanji kexue (Aug 2021)

Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning

PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin

Affiliations

PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin: College of Informationand Cyber Security,People's Public Security University of China,Beijing 100038,China

DOI: https://doi.org/10.11896/jsjkx.200500148
Journal volume & issue: Vol. 48, no. 8
pp. 200 – 208

Abstract

Read online

Voice information processing technology is developing rapidly under the impetus of deep learning.The combination of speech synthesis and voice conversion technology can achieve real-time high-fidelity voice output of designated objects and content,and has broad application prospects in man-machine interaction,pan-entertainment and other fields.This paper aims to provide an overview of speech synthesis and voice conversion technology based on deep learning.First,this paper briefly reviews the development of speech synthesis and voice conversion technology.Next,it enumerates the common public datasets in these fields so that it is convenient for researchers to carry out related explorations.Then,it discusses the TTS models,including the classic and cutting-edge models and algorithms in terms of style,rhythm,speed,and compares their effects and development potentials respectively.Then,it reviews voice conversion by summarizing the voice conversion methods and optimization methods.Finally,it summarizes the applications and challenges of speech synthesis and voice conversion,and looks forward to their future development direction in model compression,few-shot learning and forgery detection,based on the problems faced by them in terms of model,application and regulation.

voice information processing|speech synthesis|voice conversion|deep learning|generative adversarial networks

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords