SAGE Open (Dec 2024)
An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
Abstract
E-learning systems are transforming the educational sector and making education more affordable and accessible. Recently, many e-learning systems have been equipped with advanced technologies that facilitate the roles of educators and increase the efficiency of teaching and learning. One such technology is Automatic Essay Grading (AEG) or Automatic Text Scoring (ATS) systems. To enable educators to remain more focused on teaching, there is a dire need to develop a more efficient use of their time. This is where automatic systems come into play, but they are still encountering an ongoing challenge due to many complex aspects, such as covering students’ creativity, novelty, context, subjectivity, coherence, cohesion, and homogeneity. The proposed study chose the Kaggle dataset of the Hewlett Foundation competition to cover this gap. It contains eight different essay sets based on student-written essays and their different range-based scores. Firstly, a score quantification method is applied to domain scores. Moreover, the proposed study covered four different aspects of student-written essays and extracted cohesion features via sentence connectivity, coherence via sentence relatedness, statistical lexical features via the Term Frequency (TF)-Inverse Document Frequency (IDF) method, and discourse macrostructural features via calculating the unique pattern of each essay. Three different experiments based upon the combination of these features are conducted, the most effective combination of features remains as statistical lexical features and discourse macrostructural features whereas the Linear Regression method is used for score prediction. The average Quadratic Weighted Kappa (QWK) score of 0.9339 was achieved and outperformed previous solutions in terms of time, computation, and performance.