Assessing English language sentences readability using machine learning models

Shazia Maqsood; Abdul Shahid; Muhammad Tanvir Afzal; Muhammad Roman; Zahid Khan; Zubair Nawaz; Muhammad Haris Aziz

doi:10.7717/peerj-cs.818

PeerJ Computer Science (Jan 2022)

Assessing English language sentences readability using machine learning models

Shazia Maqsood,
Abdul Shahid,
Muhammad Tanvir Afzal,
Muhammad Roman,
Zahid Khan,
Zubair Nawaz,
Muhammad Haris Aziz

Affiliations

Shazia Maqsood: Institute of Computing, Kohat University of Science and Technology, Kohat, KPK, Pakistan
Abdul Shahid: Institute of Computing, Kohat University of Science and Technology, Kohat, KPK, Pakistan
Muhammad Tanvir Afzal: NAMAL Institue of Mianwali, Mianwali, Punjab, Pakistan
Muhammad Roman: Institute of Computing, Kohat University of Science and Technology, Kohat, KPK, Pakistan
Zahid Khan: Robotics and Internet of Things Lab, Prince Sultan University, Riyadh, Saudi Arabia
Zubair Nawaz: Department of Data Science, Faculty of Computing and Information Technology, University of the Punjab, Lahore, Punjab, Pakistan
Muhammad Haris Aziz: Mechanical Engineering Department, University of Sargodha, Sargodha, Sargodha, Punjab, Pakistan

DOI: https://doi.org/10.7717/peerj-cs.818
Journal volume & issue: Vol. 7
p. e818

Abstract

Read online Read online

Readability is an active field of research in the late nineteenth century and vigorously persuaded to date. The recent boom in data-driven machine learning has created a viable path forward for readability classification and ranking. The evaluation of text readability is a time-honoured issue with even more relevance in today’s information-rich world. This paper addresses the task of readability assessment for the English language. Given the input sentences, the objective is to predict its level of readability, which corresponds to the level of literacy anticipated from the target readers. This readability aspect plays a crucial role in drafting and comprehending processes of English language learning. Selecting and presenting a suitable collection of sentences for English Language Learners may play a vital role in enhancing their learning curve. In this research, we have used 30,000 English sentences for experimentation. Additionally, they have been annotated into seven different readability levels using Flesch Kincaid. Later, various experiments were conducted using five Machine Learning algorithms, i.e., KNN, SVM, LR, NB, and ANN. The classification models render excellent and stable results. The ANN model obtained an F-score of 0.95% on the test set. The developed model may be used in education setup for tasks such as language learning, assessing the reading and writing abilities of a learner.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords