Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2017)

Syntax description synthesis using gradient boosted trees

  • ,
  • Arseny Astashkin,
  • Kirill Chuvilin

DOI
https://doi.org/10.23919/FRUCT.2017.8071289
Journal volume & issue
Vol. 776, no. 20
pp. 32 – 39

Abstract

Read online

The article considers partially formalized text documents. For such documents, it is not possible to construct a formal grammar. Therefore, an external syntax description is used to build the syntax tree. The problem is the high labor intensity and the high professional requirements for manual preparation of such descriptions. It is proposed to use machine learning methods to automate this process. The training set is composed using the documents with known syntax description. Each document is represented as a syntax tree using the TEXnous parser. Each node of these trees represents a syntax element, and the set of nodes forms the training set. A way of a single syntax element description is proposed so that a formal description of the syntax elements constitutes the space of classes. In the article, this space is limited to the set of parser modes used during the documents analysis. A set of scientific articles is used for the experiments. XGBoost implementation of gradient boosted trees is chosen for result classification problem.

Keywords