Моделирование и анализ информационных систем (Dec 2023)

Extracting named entities from Russian-language documents with different expressiveness of structure

  • Maria D. Averina,
  • Olga A. Levanova

DOI
https://doi.org/10.18255/1818-1015-2023-4-382-393
Journal volume & issue
Vol. 30, no. 4
pp. 382 – 393

Abstract

Read online

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.

Keywords