Intelligent Systems with Applications (May 2022)

Neural network based autonomous control of a speech synthesis system

  • Dimokritos Panagiotopoulos,
  • Christos Orovas,
  • Dimitrios Syndoukas

Journal volume & issue
Vol. 14
p. 200077

Abstract

Read online

This work is inspired by the ability of neural systems to control the behavior of specialized actuator mechanisms in living organisms by monitoring the end-effect of their actions. We consider as an example of such an actuator mechanism the human vocal tract where neurons learn to activate its muscles that move the velum, jaw, tongue and the lips, in order to exhibit desired phonetic activity. As a technical approximation to such a setup, we use an artificial neural network (ANN) and a speech synthesizer and we study the capability of the ANN to estimate the synthesizer’s parameters targeting desired speech activity. In this setup, we assume that the training error is obtained by measuring the “perceived distance” between the original (target) and the synthesized speech signals. Thus, the training error needs to be measured after processing the output of the speech synthesizer, instead of measuring it directly at the outputs of the ANN. This operational requirement on error measurement restricts the application of widely used ANN training algorithms that are based on back propagation of gradients but can be met by our earlier proposed “Heuristically Enhanced Gradient Approximation” (HEGA) algorithm. We also propose enhancements to HEGA that further optimize its performance in this demanding application.

Keywords