Natural Language Processing Journal (Dec 2024)

Persian readability classification using DeepWalk and tree-based ensemble methods

  • Mohammad Mahmoodi Varnamkhasti

Journal volume & issue
Vol. 9
p. 100116

Abstract

Read online

The Readability Classification (Difficulty classification) problem is the task of assessing the readability of text by categorizing it into different levels or classes based on its difficulty to understand. Applications ranging from language learning tools to website content optimization depend on readability classification. While numerous techniques have been proposed for readability classification in various languages, the topic has received little attention in the Persian (Farsi) language. Persian readability analysis poses unique challenges due to its complex morphology and flexible syntax, which necessitate a customized approach for accurate classification. In this research, we have proposed a method based on the nodes graph embedding and tree-based classification methods for sentence-level readability classification in the Persian language. The results indicate an F1-score of up to 0.961 in predicting the readability of Persian sentences.

Keywords