大数据 (May 2024)
A survey of voice conversion based on non-parallel data
Abstract
Voice conversion is a research topic in the fields of speech and artificial intelligence.The goal of voice conversion is to change the timbre of speech while preserving the content of the source speech, making it sounds like spoken by the target speaker.It is essential to ensure both the quality and naturalness of the converted speech.Voice conversion based on nonparallel data gains much attention currently, where models are trained using non-parallel multilingual speaker datasets, enabling many-to-many and any-to-any voice conversions.This paper provides a comprehensive summary and analysis of recent developments in non-parallel voice conversion.Firstly, we outline the early voice conversion techniques based on parallel corpus and their limitations.Then, we introduce and compare various approaches to voice conversion based on nonparallel data, providing a thorough analysis.Finally, a summary and outlook on voice conversion technology is provided.