CFNAM-PG: Bridging Phonetic and Glyphic Information for Chinese Full Name and Abbreviation Matching Based on Simbert and DenseNet

Dongsheng Wang; Yue Feng; Jiawei Li; Sha Liu; Miaomiao Zhou; Diming Zhang; Huige Li

doi:10.1007/s44196-024-00549-x

International Journal of Computational Intelligence Systems (Jun 2024)

CFNAM-PG: Bridging Phonetic and Glyphic Information for Chinese Full Name and Abbreviation Matching Based on Simbert and DenseNet

Dongsheng Wang,
Yue Feng,
Jiawei Li,
Sha Liu,
Miaomiao Zhou,
Diming Zhang,
Huige Li

Affiliations

Dongsheng Wang: School of Computer, Jiangsu University of Science and Technology
Yue Feng: School of Computer, Jiangsu University of Science and Technology
Jiawei Li: School of Computer, Jiangsu University of Science and Technology
Sha Liu: School of Computer, Jiangsu University of Science and Technology
Miaomiao Zhou: School of Computer, Jiangsu University of Science and Technology
Diming Zhang: School of Computer, Jiangsu University of Science and Technology
Huige Li: School of Computer, Jiangsu University of Science and Technology

DOI: https://doi.org/10.1007/s44196-024-00549-x
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Matching abbreviated names with their full names (full-abbr matching) plays a key role in data integration, address matching, information retrieval, and other fields. Traditional full-abbr matching technology often encounters issues related to near homophones and near homoglyphs. First, a near-homophone full-abbr matching model based on Simbert and VGG was first proposed, which integrates character and speech features, leveraging a speech recognition model and combining a brain-like cognitive learning dual-process mechanism which involves linguistic knowledge and neural network together. Second, to address the problem of near-homoglyph full-abbr matching in Chinese, a DenseNet-based model that fuses glyph structure and image features was proposed, in which statistical feature extractors are employed to extract feature vectors for glyphic features including stroke, Wubi and structural features separately. Lastly, the near-homophone model and the near-homoglyph model are coupled to work together in the full-abbr matching task, in which expert knowledge is used as a component of the feature optimizer. Experimental results showed that the integrated model significantly increased the matching accuracy to 87.5%, demonstrating a 12.3% improvement.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords