Tehnički Vjesnik (Jan 2023)

Chinese Named Entity Recognition Method for Domain-Specific Text

  • He Liu,
  • Yuekun Ma,
  • Chang Gao,
  • Jia Qi,
  • Dezheng Zhang

DOI
https://doi.org/10.17559/TV-20230324000477
Journal volume & issue
Vol. 30, no. 6
pp. 1799 – 1808

Abstract

Read online

The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods.

Keywords