Adaptive Chinese Pinyin IME for Most Similar Representation

Dongsheng Jiang; Xinyu Cheng; Tianyi Han

doi:10.1109/ACCESS.2022.3218337

IEEE Access (Jan 2022)

Adaptive Chinese Pinyin IME for Most Similar Representation

Dongsheng Jiang,
Xinyu Cheng,
Tianyi Han

Affiliations

Dongsheng Jiang: ORCiD; State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
Xinyu Cheng: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
Tianyi Han: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China

DOI: https://doi.org/10.1109/ACCESS.2022.3218337
Journal volume & issue: Vol. 10
pp. 119533 – 119545

Abstract

Read online

Many neural-network approaches are used for Pinyin-to-character (P2C) conversion in Chinese input method engines (IMEs). However, in previous research, the conversion effectiveness of neural network P2C models relies on adequate training data. Unfortunately, neural networks cannot maintain high performance with conversions across users and domains. In this study, we propose a method for improving the efficiency of model conversion and tracking user behavior based on dynamic storage and representations that can be updated using historical information from user input. Our experimental results show that our technique tracks user behavior and has strong domain adaptability without requiring additional training. For the cross-domain datasets Touchpal, cMedQA1.0, CAIL2019, compared with the direct use of neural network, its indicators, Top-1 MIU-Acc, CA and KySS, are improved by at least 20.0%, 8.1%, 18.3%, respectively, and the results are close to the in-domain training of the model. Furthermore, compared with the traditional methods On-OMWA and Google IME, this method improves at least 7.8%, 2.0%, 11.9% and 3.2%, 0.7%, 13.9% in Top-1 MIU-Acc, CA and Kyss, respectively. This demonstrates that the proposed method is superior to existing models in terms of conversion accuracy and generality, and can point a new path for P2C platforms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords