智能科学与技术学报 (Jan 2024)
Named Entity Recognition based on Span and Category Enhancement for Chinese News
Abstract
In the domain of news, recognizing named entities involves complex syntactic structures and long names, leading to the challenge of determine entity boundaries and the problem of early interruption of sequence labeling methods when predicting long entities. To address these issues, this paper proposes a new model, named SpaCE (Named Entity Recognition based on Span and Category Enhancement for Chinese News), to identify named entity from Chinese news. SpaCE is based on the pre-trained model of BERT and enhanced by span prediction and category description. In the process of encoding news text information, the model combines category description to enhance semantic knowledge, and adopts a span-based decoding method to solve the problem of long entity prediction interruption. In addition, the word boundary information is introduced by the method of precise labeling, and the entity matching strategy is optimized, which effectively reduces the non-entity matching situation caused by span decoding. Compared to the baseline models, SpaCE improves the performance on all three datasets. In addition, in the dataset with disordered texts, the SpaCE model still shows strong named entity recognition ability, indicating its good robustness.