Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2018)
RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages
Abstract
This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.