Applied Sciences (Dec 2020)

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

  • Jiří Přibil,
  • Anna Přibilová,
  • Jindřich Matoušek

DOI
https://doi.org/10.3390/app11010002
Journal volume & issue
Vol. 11, no. 1
p. 2

Abstract

Read online

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.

Keywords