Hierarchical Phoneme Classification for Improved Speech Recognition

Donghoon Oh; Jeong-Sik Park; Ji-Hwan Kim; Gil-Jin Jang

doi:10.3390/app11010428

Applied Sciences (Jan 2021)

Hierarchical Phoneme Classification for Improved Speech Recognition

Donghoon Oh,
Jeong-Sik Park,
Ji-Hwan Kim,
Gil-Jin Jang

Affiliations

Donghoon Oh: SK Holdings C&C, Gyeonggi-do 13558, Korea
Jeong-Sik Park: Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Korea
Ji-Hwan Kim: Department of Computer Science and Engineering, Sogang University, Seoul 04107, Korea
Gil-Jin Jang: School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

DOI: https://doi.org/10.3390/app11010428
Journal volume & issue: Vol. 11, no. 1
p. 428

Abstract

Read online

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords