Mathematics (Oct 2024)

Hybrid Approach to Automated Essay Scoring: Integrating Deep Learning Embeddings with Handcrafted Linguistic Features for Improved Accuracy

  • Muhammad Faseeh,
  • Abdul Jaleel,
  • Naeem Iqbal,
  • Anwar Ghani,
  • Akmalbek Abdusalomov,
  • Asif Mehmood,
  • Young-Im Cho

DOI
https://doi.org/10.3390/math12213416
Journal volume & issue
Vol. 12, no. 21
p. 3416

Abstract

Read online

Automated Essay Scoring (AES) systems face persistent challenges in delivering accuracy and efficiency in evaluations. This study introduces an approach that combines embeddings generated using RoBERTa with handcrafted linguistic features, leveraging Lightweight XGBoost (LwXGBoost) for enhanced scoring precision. The embeddings capture the contextual and semantic aspects of essay content, while handcrafted features incorporate domain-specific attributes such as grammar errors, readability, and sentence length. This hybrid feature set allows LwXGBoost to handle high-dimensional data and model intricate feature interactions effectively. Our experiments on a diverse AES dataset, consisting of essays from students across various educational levels, yielded a QWK score of 0.941. This result demonstrates the superior scoring accuracy and the model’s robustness against noisy and sparse data. The research underscores the potential for integrating embeddings with traditional handcrafted features to improve automated assessment systems.

Keywords