A Preliminary Study of Model-Generated Speech

Man-Ni Chu; Yu-Chun Wang

doi:10.3390/app14073104

Applied Sciences (Apr 2024)

A Preliminary Study of Model-Generated Speech

Man-Ni Chu,
Yu-Chun Wang

Affiliations

Man-Ni Chu: Graduate Institute of Cross-Cultural Studies, Fu Jen University, New Taipei City 242062, Taiwan
Yu-Chun Wang: Department of Buddhist Studies, Dharma Drum Institute of Liberal Arts, New Taipei City 208303, Taiwan

DOI: https://doi.org/10.3390/app14073104
Journal volume & issue: Vol. 14, no. 7
p. 3104

Abstract

Read online

The goal of this study was to compare model-generated sounds with the process of sound acquisition in humans. The research utilized two dictionaries of the Chaoshan dialect spanning approximately one century. Identical Chinese characters were selected from each dictionary, and their contemporary pronunciations were documented. Subsequently, inconsistencies in pronunciation were manually rectified, following which three machine learning methods were employed to train the pronunciation of words from one dictionary to another. These methods comprised the attention-based sequence-to-sequence method, DirecTL+, and Sequitur. The accuracy of the model was evaluated using five-fold cross-validation, revealing a maximum accuracy of 68%. Additionally, the study investigated how the probability of a sound’s subsequent unit influences the accuracy of the machine learning methods. The attention-based sequence-to-sequence model is not solely influenced by the frequency of input but also by the probability of the subsequent unit.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords