Jisuanji kexue (Mar 2023)

Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction

  • LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui

DOI
https://doi.org/10.11896/jsjkx.220200020
Journal volume & issue
Vol. 50, no. 3
pp. 276 – 281

Abstract

Read online

Compared with English text which is naturally composed of words,Chinese text has no word delimiters,so the combination of Chinese characters is more flexible,and it's more difficult to determine the entity boundaries in Chinese named entity recognition(NER).Current mainstream methods transform the NER task into a sequence labeling task.This paper studies the predicted label sequence under the BIOES tag scheme and calculates the entity boundary accuracy by separately considering the entity head label B or tail label E,which shows that increasing the boundary accuracy can further improve the accuracy of entity recognition.We expand the boundaries of entities with continuous labels,use the label type of the last character of the entity to correct the entity type,and use the word segmentation information to fill in the entity with incomplete labels.Finally,this paper proposes a BIO+ES labeling scheme that adds boundary labels to distinguish non-entity characters at entity boundaries and further improves the performance of Chinese NER.

Keywords