RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages

Sergei Budkov; Kseniya Buraya; Andrey Filchenkov; Ivan Smetannikov; Antonina Puchkovskaia

Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2018)

RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages

Sergei Budkov,
Kseniya Buraya,
Andrey Filchenkov,
Ivan Smetannikov,
Antonina Puchkovskaia

Affiliations

Sergei Budkov: ITMO University Saint-Petersburg, Russia
Kseniya Buraya: ITMO University Saint-Petersburg, Russia
Andrey Filchenkov: ITMO University Saint-Petersburg, Russia
Ivan Smetannikov: ITMO University Saint-Petersburg, Russia
Antonina Puchkovskaia: ITMO University Saint-Petersburg, Russia

Journal volume & issue: Vol. 602, no. 23
pp. 78 – 84

Abstract

Read online

This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the ﬁeld of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modiﬁcations of the proposed methods.

Published in Proceedings of the XXth Conference of Open Innovations Association FRUCT

ISSN: 2305-7254 (Print); 2343-0737 (Online)
Publisher: FRUCT
Country of publisher: Finland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://fruct.org/publication

About the journal

Abstract

Keywords