Extracting named entities from Russian-language documents with different expressiveness of structure

Maria D. Averina; Olga A. Levanova

doi:10.18255/1818-1015-2023-4-382-393

Моделирование и анализ информационных систем (Dec 2023)

Extracting named entities from Russian-language documents with different expressiveness of structure

Maria D. Averina,
Olga A. Levanova

Affiliations

Maria D. Averina: P.G. Demidov Yaroslavl State University
Olga A. Levanova: P.G. Demidov Yaroslavl State University

DOI: https://doi.org/10.18255/1818-1015-2023-4-382-393
Journal volume & issue: Vol. 30, no. 4
pp. 382 – 393

Abstract

Read online

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.

Published in Моделирование и анализ информационных систем

ISSN: 1818-1015 (Print); 2313-5417 (Online)
Publisher: Yaroslavl State University
Country of publisher: Russian Federation
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://mais-journal.ru/

About the journal

Abstract

Keywords