Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition

Yuki Takashima; Toru Nakashika; Tetsuya Takiguchi; Yasuo Ariki

doi:10.1186/s13636-019-0160-1

EURASIP Journal on Audio, Speech, and Music Processing (Sep 2019)

Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition

Yuki Takashima,
Toru Nakashika,
Tetsuya Takiguchi,
Yasuo Ariki

Affiliations

Yuki Takashima: Graduate School of System Informatics, Kobe University
Toru Nakashika: Graduate School of Informatics and Engineering, The University of Electro-Communications
Tetsuya Takiguchi: Graduate School of System Informatics, Kobe University
Yasuo Ariki: Graduate School of System Informatics, Kobe University

DOI: https://doi.org/10.1186/s13636-019-0160-1
Journal volume & issue: Vol. 2019, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-based VC has been widely researched because of the natural-sounding voice it achieves when compared with conventional Gaussian mixture model-based VC. In conventional NMF-VC, models are trained using parallel data which results in the speech data requiring elaborate pre-processing to generate parallel data. NMF-VC also tends to be an extensive model as this method has several parallel exemplars for the dictionary matrix, leading to a high computational cost. In this study, an innovative parallel dictionary-learning method using non-negative Tucker decomposition (NTD) is proposed. The proposed method uses tensor decomposition and decomposes an input observation into a set of mode matrices and one core tensor. The proposed NTD-based dictionary-learning method estimates the dictionary matrix for NMF-VC without using parallel data. The experimental results show that the proposed method outperforms other methods in both parallel and non-parallel settings.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords