Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2018)

RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages

  • Sergei Budkov,
  • Kseniya Buraya,
  • Andrey Filchenkov,
  • Ivan Smetannikov,
  • Antonina Puchkovskaia

Journal volume & issue
Vol. 602, no. 23
pp. 78 – 84

Abstract

Read online

This work deals with never-ending learning ap- proach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial pa- rameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.

Keywords