IEEE Access (Jan 2025)

Combining Autoregressive Models and Phonological Knowledge Bases for Improved Accuracy in Korean Grapheme-to-Phoneme Conversion

  • Sung-Ki Choi,
  • Hyuk-Chul Kwon,
  • Minho Kim

DOI
https://doi.org/10.1109/ACCESS.2025.3581981
Journal volume & issue
Vol. 13
pp. 107678 – 107693

Abstract

Read online

Recent advances in deep learning have highlighted the importance of Grapheme-to-Phoneme (G2P) conversion in natural language processing and speech synthesis. Korean exhibits complex phonological changes such as liaison, initial sound law, and consonant assimilation, making it challenging to handle all exceptional patterns with simple models alone. In this paper, we propose integrating an autoregressive (AR) model with a phonological knowledge base that leverages standard pronunciation rules and a pre-analyzed dictionary, and conduct systematic comparisons with non-autoregressive (NAR) variants to validate the effectiveness of sequential processing for Korean G2P. Experimental results show that a BiLSTM-LSTM AR model based on ELECTRA embeddings achieves a phoneme error rate (PER) of 0.2%, word error rate (WER) of 0.68%, and a sentence accuracy of 95.16%, outperforming both a traditional rule-based approach (24.51% sentence accuracy) and non-autoregressive variants implemented on the same dataset (achieving 81.32%–85.72% sentence accuracy). When syllable constraints are introduced, accuracy significantly improves to 95.41% (p <0.05). Meanwhile, incorporating a pre-analyzed dictionary enhances the handling of neologisms and proper nouns without substantially increasing inference time. Through comprehensive same-dataset comparisons between AR and NAR approaches, our study demonstrates the effectiveness of combining AR models with phonological knowledge bases in Korean G2P, with AR models consistently outperforming NAR variants by 3.12-8.04% in sentence accuracy, while identifying key challenges including rule conflicts and handling multiple standard pronunciations. The methodology and findings provide a generalizable framework for other morpho-phonologically complex languages.

Keywords