Interpreting the Relevance of Readability Prediction Features

Safae Berrichi; Naoual Nassiri; Azzeddine Mazroui; Abdelhak Lakhouaja

doi:10.5455/jjcit.71-1667559201

Jordanian Journal of Computers and Information Technology (Mar 2023)

Interpreting the Relevance of Readability Prediction Features

Safae Berrichi,
Naoual Nassiri,
Azzeddine Mazroui,
Abdelhak Lakhouaja

Affiliations

Safae Berrichi: [email protected]
Naoual Nassiri: [email protected]
Azzeddine Mazroui: [email protected]
Abdelhak Lakhouaja: [email protected]

DOI: https://doi.org/10.5455/jjcit.71-1667559201
Journal volume & issue: Vol. 9, no. 1
pp. 36 – 52

Abstract

Read online

Text readability is one of the main research areas widely developed in several languages but highly limited when dealing with the Arabic language. The main challenge in this area is to identify an optimal set of features that represent texts and allow us to evaluate their readability level. To address this challenge, we propose in this study various feature selection methods that can significantly retrieve the set of discriminating features representing Arabic texts. The second aim of this paper is to evaluate different sentence embedding approaches (ArabicBert, AraBert, and XLM-R) and compare their performances to those obtained using the selected linguistic features. We performed experiments with both SVM and Random Forest classifiers on two different corpora dedicated to learning Arabic as a foreign language (L2). The obtained results show that reducing the number of features improves the performance of the readability prediction models by more than 25% and 16% for the two adopted corpora, respectively. In addition, the fine-tuned Arabic-BERT model performs better than the other sentence embedding methods, but provided less improvement than the feature-based models. Combining these methods with the most discriminating features produced the best performance. [JJCIT 2023; 9(1.000): 36-52]

Published in Jordanian Journal of Computers and Information Technology

ISSN: 2413-9351 (Print); 2415-1076 (Online)
Publisher: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
Country of publisher: Jordan
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jjcit.org/

About the journal

Abstract

Keywords