Applied Sciences (Dec 2023)

Enhanced Chinese Domain Named Entity Recognition: An Approach with Lexicon Boundary and Frequency Weight Features

  • Yan Guo,
  • Shixiang Feng,
  • Fujiang Liu,
  • Weihua Lin,
  • Hongchen Liu,
  • Xianbin Wang,
  • Junshun Su,
  • Qiankai Gao

DOI
https://doi.org/10.3390/app14010354
Journal volume & issue
Vol. 14, no. 1
p. 354

Abstract

Read online

Named entity recognition (NER) plays a crucial role in information extraction but faces challenges in the Chinese context. Especially in Chinese paleontology popular science, NER encounters difficulties, such as low recognition performance for long and nested entities, as well as the complexity of handling mixed Chinese–English texts. This study aims to enhance the performance of NER in this domain. We propose an approach based on the multi-head self-attention mechanism for integrating Chinese lexicon-level features; by integrating Chinese lexicon boundary and domain term frequency weight features, this method enhances the model’s perception of entity boundaries, relative positions, and types. To address training prediction inconsistency, we introduce a novel data augmentation method, generating enhanced data based on the difference set between all and sample entity types. Experiments on four Chinese datasets, namely Resume, Youku, SubDuIE, and our PPOST, show that our approach outperforms baselines, achieving F1-score improvements of 0.03%, 0.16%, 1.27%, and 2.28%, respectively. This research confirms the effectiveness of integrating Chinese lexicon boundary and domain term frequency weight features in NER. Our work provides valuable insights for improving the applicability and performance of NER in other Chinese domain scenarios.

Keywords