Jisuanji kexue yu tansuo (Jun 2024)

Nested Named Entity Recognition Combining Multi-modal and Multi-span Features

  • QIU Yunfei, XING Haoran, YU Zhilong, ZHANG Wenwen

DOI
https://doi.org/10.3778/j.issn.1673-9418.2302029
Journal volume & issue
Vol. 18, no. 6
pp. 1613 – 1626

Abstract

Read online

Nested named entity recognition (NNER) has become a research hotspot in information extraction because of its increasingly important practical significance. However, due to the shortage of corpus resources, limited exhaustive windows, missing span features, etc., NNER research in vertical field has made slow progress and there are problems of entity recognition errors or omissions. To solve these problems, a vertical field NNER model based on mineralogy and corpus awareness dictionary is proposed. Firstly, the point mutual information, word frequency inverse text frequency algorithm and attention mechanism are combined to automatically integrate the corpus awareness dictionary, and the anchor text knowledge is used to improve the training accuracy of the model. Secondly, from the shared perspective, three multi-modal information fusion strategies are designed to train the encoder to learn the extended vector representation of character, glyph and vocabulary. Through triple product operation and slicing attention mechanism, the private representations captured by the multi-layer perceptron are screened and integrated to narrow the spatial gap of heterogeneous features. Thirdly, the context association between spans is determined by a bottom-up hierarchical architecture, and the proposed span set is generated. The characteristics of target span and adjacent span, target span internal characterization, target span boundary, etc. are obtained by double affine mechanism and linear classifier. Finally, the corresponding entity type label is assigned to the target span. Experimental results on six datasets show that compared with baseline model, the proposed method achieves significant performance improvement and can effectively improve the NNER task effect in low-resource scenarios.

Keywords