Applied Sciences (Dec 2021)
A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning
Abstract
The objective of this research is to develop an machine learning (ML) -based system that evaluates the performance of high school students during the semester and identify the most significant factors affecting student performance. It also specifies how the performance of models is affected when models run on data that only include the most important features. Classifiers employed for the system include random forest (RF), support vector machines (SVM), logistic regression (LR) and artificial neural network (ANN) techniques. Moreover, the Boruta algorithm was used to calculate the importance of features. The dataset includes behavioral information, individual information and the scores of students that were collected from teachers and a one-by-one survey through an online questionnaire. As a result, the effective features of the database were identified, and the least important features were eliminated from the dataset. The ANN accuracy, which was the best accuracy in the original dataset, was reduced in the decreased dataset. On the contrary, SVM performance was improved, which had the highest accuracy among other models, with 0.78. Moreover, the LR and RF models could provide the same performance in the decreased dataset. The results showed that ML models are influential for evaluating students, and stakeholders can use the identified effective factors to improve education.
Keywords