Applied Sciences (Mar 2024)
Research on Chinese Named Entity Recognition Based on Lexical Information and Spatial Features
Abstract
In the field of Chinese-named entity recognition, recent research has sparked new interest by combining lexical features with character-based methods. Although this vocabulary enhancement method provides a new perspective, it faces two main challenges: firstly, using character-by-character matching can easily lead to conflicts during the vocabulary matching process. Although existing solutions attempt to alleviate this problem by obtaining semantic information about words, they still lack sufficient temporal sequential or global information acquisition; secondly, due to the limitations of dictionaries, there may be words in a sentence that do not match the dictionary. In this situation, existing vocabulary enhancement methods cannot effectively play a role. To address these issues, this paper proposes a method based on lexical information and spatial features. This method carefully considers the neighborhood and overlap relationships of characters in vocabulary and establishes global bidirectional semantic and temporal sequential information to effectively address the impact of conflicting vocabulary and character fusion on entity segmentation. Secondly, the attention score matrix extracted by the point-by-point convolutional network captures the local spatial relationship between characters without fused vocabulary information and characters with fused vocabulary information, aiming to compensate for information loss and strengthen spatial connections. The comparison results with the baseline model show that the SISF method proposed in this paper improves the F1 metric by 0.72%, 3.12%, 1.07%, and 0.37% on the Resume, Weibo, Ontonotes 4.0, and MSRA datasets, respectively.
Keywords