Signals (Jul 2021)

Voice Transformation Using Two-Level Dynamic Warping and Neural Networks

  • Al-Waled Al-Dulaimi,
  • Todd K. Moon,
  • Jacob H. Gunther

DOI
https://doi.org/10.3390/signals2030028
Journal volume & issue
Vol. 2, no. 3
pp. 456 – 474

Abstract

Read online

Voice transformation, for example, from a male speaker to a female speaker, is achieved here using a two-level dynamic warping algorithm in conjunction with an artificial neural network. An outer warping process which temporally aligns blocks of speech (dynamic time warp, DTW) invokes an inner warping process, which spectrally aligns based on magnitude spectra (dynamic frequency warp, DFW). The mapping function produced by inner dynamic frequency warp is used to move spectral information from a source speaker to a target speaker. Artifacts arising from this amplitude spectral mapping are reduced by reconstructing phase information. Information obtained by this process is used to train an artificial neural network to produce spectral warping information based on spectral input data. The performance of the speech mapping compared using Mel-Cepstral Distortion (MCD) with previous voice transformation research, and it is shown to perform better than other methods, based on their reported MCD scores.

Keywords