Shanghai Jiaotong Daxue xuebao (Feb 2021)

Named Entity Recognition of Enterprise Annual Report Integrated with BERT

  • ZHANG Jingyi,
  • HE Guanghui,
  • DAI Zhou,
  • LIU Yadong

DOI
https://doi.org/10.16183/j.cnki.jsjtu.2020.009
Journal volume & issue
Vol. 55, no. 02
pp. 117 – 123

Abstract

Read online

Automatically extracting key data from annual reports is an important means of business assessments. Aimed at the characteristics of complex entities, strong contextual semantics, and small scale of key entities in the field of corporate annual reports, a BERT-BiGRU-Attention-CRF model was proposed to automatically identify and extract entities in the annual reports of enterprises. Based on the BiGRU-CRF model, the BERT pre-trained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information. Furthermore, the attention mechanism was used to fully mine the global and local features of the text. The experiment was performed on a self-constructed corporate annual report corpus, and the model was compared with multiple sets of models. The results show that the value of F1 (harmonic mean of precision and recall) of the BERT-BiGRU-Attention-CRF model is 93.69%. The model has a better performance than other traditional models in annual reports, and is expected to provide an automatic means for enterprise assessments.

Keywords