Automated Grading Method of Python Code Submissions Using Large Language Models and Machine Learning

Mariam Mahdaoui; Said Nouh; My Seddiq El Kasmi Alaoui; Khalid Kandali

doi:10.3390/info16080674

Information (Aug 2025)

Automated Grading Method of Python Code Submissions Using Large Language Models and Machine Learning

Mariam Mahdaoui,
Said Nouh,
My Seddiq El Kasmi Alaoui,
Khalid Kandali

Affiliations

Mariam Mahdaoui: Laboratory of Information Technology and Modeling (LTIM), Hassan II University of Casablanca, Casablanca 20360, Morocco
Said Nouh: Laboratory of Information Technology and Modeling (LTIM), Hassan II University of Casablanca, Casablanca 20360, Morocco
My Seddiq El Kasmi Alaoui: Computer Science and Systems Research Laboratory (LIS), Hassan II University of Casablanca, Casablanca 20360, Morocco
Khalid Kandali: Laboratory of Information Technology and Modeling (LTIM), Hassan II University of Casablanca, Casablanca 20360, Morocco

DOI: https://doi.org/10.3390/info16080674
Journal volume & issue: Vol. 16, no. 8
p. 674

Abstract

Read online

Assessment is fundamental to programming education; however, it is a labour-intensive and complicated process, especially in extensive learning contexts where it relies significantly on human teachers. This paper presents an automated grading methodology designed to assess Python programming exercises, producing both continuous and discrete grades. The methodology incorporates GPT-4-Turbo, a robust large language model, and machine learning models selected by PyCaret’s automated process. The Extra Trees Regressor demonstrated superior performance in continuous grade prediction, with a Mean Absolute Error (MAE) of 4.43 out of 100 and an R2 score of 0.83. The Random Forest Classifier attained the highest scores for discrete grade classification, achieving an accuracy of 91% and a Quadratic Weighted Kappa of 0.84, indicating substantial concordance with human-assigned categories. These findings underscore the promise of integrating LLMs and automated model selection to facilitate scalable, consistent, and equitable assessment in programming education, while substantially alleviating the workload on human evaluators.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords