IEEE Access (Jan 2021)

Marginal Effects of Language and Individual Raters on Speech Quality Models

  • Michael Chinen

DOI
https://doi.org/10.1109/ACCESS.2021.3112165
Journal volume & issue
Vol. 9
pp. 127320 – 127334

Abstract

Read online

Speech quality is often measured via subjective testing, or with objective estimators of mean opinion score (MOS) such as ViSQOL or POLQA. Typical MOS-estimation frameworks use signal level features but do not use language features that have been shown to have an effect on opinion scores. If there is a conditional dependence between score and language given these signal features, introducing language and rater predictors should provide a marginal improvement in predictions. The proposed method uses Bayesian models that predict the individual opinion score instead of MOS. Several models that test various combinations of predictors were used, including predictors that capture signal features, such as frequency band similarity, as well as features that are related to the listener, such as a language and rater index. The models are fit to the ITU-T P. Supplement 23 dataset, and posterior samples are drawn from distributions of both the model parameters and the resulting opinion score outcomes. These models are compared to MOS models by integrating over posterior samples per utterance. An experiment was conducted by ablating different predictors for several types of Bayesian hierarchical models (including ordered logistic and truncated normal individual score distributions, as well as MOS distributions) to find the marginal improvement of language and rater. The models that included language and/or rater obtained significantly lower errors (0.601 versus 0.684 root-mean-square error (RMSE)) and higher correlation. Additionally, individual rater models matched or exceeded the performance of MOS models.

Keywords