A Simple Speech Production System Based on Formant Estimation of a Tongue Articulatory System Using Human Tongue Orientation

Palli Padmini; Deepa Gupta; Mohammed Zakariah; Yousef Ajami Alotaibi; Kaustav Bhowmick

doi:10.1109/ACCESS.2020.3048076

IEEE Access (Jan 2021)

A Simple Speech Production System Based on Formant Estimation of a Tongue Articulatory System Using Human Tongue Orientation

Palli Padmini,
Deepa Gupta,
Mohammed Zakariah,
Yousef Ajami Alotaibi,
Kaustav Bhowmick

Affiliations

Palli Padmini: Department of Electronics & Communication Engineering, Amrita School of Engineering, Bengaluru, India
Deepa Gupta: ORCiD; Department of Computer & Science Engineering, Amrita School of Engineering, Bengaluru, India
Mohammed Zakariah: ORCiD; Research Center, College of Computer and Information Science, King Saud University, Riyadh, Saudi Arabia
Yousef Ajami Alotaibi: ORCiD; Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Kaustav Bhowmick: Department of Electronics and Communication Engineering, PES University, Bengaluru, India

DOI: https://doi.org/10.1109/ACCESS.2020.3048076
Journal volume & issue: Vol. 9
pp. 4688 – 4710

Abstract

Read online

An algorithm for a potentially non-obtrusive speech production system was developed and characterized. The algorithm is primarily based on the articulation of the human tongue referred as tongue articulatory system (TAS) and was cascaded with a previously developed laryngeal model. We developed and optimized statistical formulae for formants of vowels and consonants and studied the model for different ages and genders. The difference between the formant frequencies obtained using both the established vocal tract system and proposed cascaded system was found to be <; 5%. The proposed model shows the significance of the articulatory nature of the tongue in human speech production. An algorithmic speech synthesizer was developed, and its output was matched with original speech signals for English vowels and consonants with an Normalized Root-Mean-Square deviation error (NRMSE) of <; 0.15ms. Further, an experimental implementation of the developed algorithm was done, with flex-sensors emulating the tongue in an artificial oral cavity. The experimental test results further confirmed the effectiveness of the algorithm, revealing interesting features under tolerance analyses. This idea relates to a means for compensating for a whole or partial loss of speech. Such a model can be useful to interpret speech for tracheostomised patients who have undergone larynx surgery, speech-disabled due to accidents or voice disorders, medical rehabilitation and for robotics.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords