Combining Autoregressive Models and Phonological Knowledge Bases for Improved Accuracy in Korean Grapheme-to-Phoneme Conversion

Sung-Ki Choi; Hyuk-Chul Kwon; Minho Kim

doi:10.1109/ACCESS.2025.3581981

IEEE Access (Jan 2025)

Combining Autoregressive Models and Phonological Knowledge Bases for Improved Accuracy in Korean Grapheme-to-Phoneme Conversion

Sung-Ki Choi,
Hyuk-Chul Kwon,
Minho Kim

Affiliations

Sung-Ki Choi: ORCiD; Department of Electrical and Computer Engineering, Pusan National University, Busan, Republic of Korea
Hyuk-Chul Kwon: Department of Electrical and Computer Engineering, Pusan National University, Busan, Republic of Korea
Minho Kim: ORCiD; Division of Artificial Intelligence, National Korea Maritime and Ocean University, Busan, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2025.3581981
Journal volume & issue: Vol. 13
pp. 107678 – 107693

Abstract

Read online

Recent advances in deep learning have highlighted the importance of Grapheme-to-Phoneme (G2P) conversion in natural language processing and speech synthesis. Korean exhibits complex phonological changes such as liaison, initial sound law, and consonant assimilation, making it challenging to handle all exceptional patterns with simple models alone. In this paper, we propose integrating an autoregressive (AR) model with a phonological knowledge base that leverages standard pronunciation rules and a pre-analyzed dictionary, and conduct systematic comparisons with non-autoregressive (NAR) variants to validate the effectiveness of sequential processing for Korean G2P. Experimental results show that a BiLSTM-LSTM AR model based on ELECTRA embeddings achieves a phoneme error rate (PER) of 0.2%, word error rate (WER) of 0.68%, and a sentence accuracy of 95.16%, outperforming both a traditional rule-based approach (24.51% sentence accuracy) and non-autoregressive variants implemented on the same dataset (achieving 81.32%–85.72% sentence accuracy). When syllable constraints are introduced, accuracy significantly improves to 95.41% (p <0.05). Meanwhile, incorporating a pre-analyzed dictionary enhances the handling of neologisms and proper nouns without substantially increasing inference time. Through comprehensive same-dataset comparisons between AR and NAR approaches, our study demonstrates the effectiveness of combining AR models with phonological knowledge bases in Korean G2P, with AR models consistently outperforming NAR variants by 3.12-8.04% in sentence accuracy, while identifying key challenges including rule conflicts and handling multiple standard pronunciations. The methodology and findings provide a generalizable framework for other morpho-phonologically complex languages.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords