IEEE Access (Jan 2021)
A Simple Speech Production System Based on Formant Estimation of a Tongue Articulatory System Using Human Tongue Orientation
Abstract
An algorithm for a potentially non-obtrusive speech production system was developed and characterized. The algorithm is primarily based on the articulation of the human tongue referred as tongue articulatory system (TAS) and was cascaded with a previously developed laryngeal model. We developed and optimized statistical formulae for formants of vowels and consonants and studied the model for different ages and genders. The difference between the formant frequencies obtained using both the established vocal tract system and proposed cascaded system was found to be <; 5%. The proposed model shows the significance of the articulatory nature of the tongue in human speech production. An algorithmic speech synthesizer was developed, and its output was matched with original speech signals for English vowels and consonants with an Normalized Root-Mean-Square deviation error (NRMSE) of <; 0.15ms. Further, an experimental implementation of the developed algorithm was done, with flex-sensors emulating the tongue in an artificial oral cavity. The experimental test results further confirmed the effectiveness of the algorithm, revealing interesting features under tolerance analyses. This idea relates to a means for compensating for a whole or partial loss of speech. Such a model can be useful to interpret speech for tracheostomised patients who have undergone larynx surgery, speech-disabled due to accidents or voice disorders, medical rehabilitation and for robotics.
Keywords