A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence

Camryn Terblanche; Michal Harty; Michelle Pascoe; Benjamin V. Tucker

doi:10.3390/app12115623

Applied Sciences (Jun 2022)

A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence

Camryn Terblanche,
Michal Harty,
Michelle Pascoe,
Benjamin V. Tucker

Affiliations

Camryn Terblanche: Department of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South Africa
Michal Harty: Department of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South Africa
Michelle Pascoe: Department of Speech and Language Pathology, University of Cape Town, Cape Town 7700, South Africa
Benjamin V. Tucker: Department of Linguistics, University of Alberta, Edmonton, AB T6G 2R3, Canada

DOI: https://doi.org/10.3390/app12115623
Journal volume & issue: Vol. 12, no. 11
p. 5623

Abstract

Read online

(1) Background: Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related to developing synthesised speech for children. (2) Method: The included studies were those that were (1) published between 2006 and 2021 and (2) included child participants or voices of children aged between 2–16 years old. (3) Results: 58 studies were identified. They were discussed based on the languages used, the speech-synthesis systems and/or methods used, the speech data used, the intelligibility of the speech and the ages of the voices. Based on the reviewed studies, relative to adult-speech synthesis, developing child-speech synthesis is notably more challenging. Child speech often presents with acoustic variability and articulatory errors. To account for this, researchers have most often attempted to adapt adult-speech models, using a variety of different adaptation techniques. (4) Conclusions: Adapting adult speech has proven successful in child-speech synthesis. It appears that the resulting quality can be improved by training a large amount of pre-selected speech data, aided by a neural-network classifier, to better match the children’s speech. We encourage future research surrounding individualised synthetic speech for children with CCN, with special attention to children who make use of low-resource languages.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords