Computer Science Journal of Moldova (Dec 2021)

Analyzing Complex Words in Hindi using Parameters of Classical Readability Formulae (Part 1)

  • Gayatri Venugopal,
  • Dhanya Pramod,
  • Jatinderkumar R. Saini

Journal volume & issue
Vol. 29, no. 3(87)
pp. 366 – 387

Abstract

Read online

Readability of a passage indicates the extent to which the meaning of the text can be understood; this could be represented in terms of the age that person should be of, or the grade that a person should be in, to understand the text. Numerous word lists and readability formulae have been devised by researchers who tested the readability of texts by involving children and adults. Most of these resources have been built for the English language. This study aims to analyse the complex words in Hindi sentences that were derived from a Human Intelligence Task (HIT), using variables considered in the widely adopted readability measures that focus on the lexical aspects of a sentence. Although there have been studies that analyse the readability of texts, this study claims to be the first of its kind, that aims to determine whether the parameters of traditional readability measures contribute significantly to context-agnostic models that classify a Hindi word as complex or simple. We report the results of two approaches used to deem a word as complex and determine the best approach out of the two. The model built using this approach was used to identify the most significant features.

Keywords