Applied Sciences (Jan 2024)
Student Performance Prediction with Regression Approach and Data Generation
Abstract
Although the modern education system is highly developed, educators have never stopped looking for new ways to improve it. After entering the 21st century, more and more educational data are stored, and data mining techniques have developed rapidly. Educational data mining has become a hot topic for educators who want to discover the information hiding among educational data. As a sub-branch of educational data mining, student performance prediction aims to predict student performance based on student datasets. This research attempts to improve the performance of predictive algorithms on a 5-level student performance grading system. This research changes the prediction method from a classification approach to a regression approach and enlarges small datasets with synthetic data. Algorithms including Support Vector Machine (SVM), Random Forest (RF), Neural Network (NN), and Generative Adversarial Networks (GANs) are used in this research. From the results obtained, it is concluded that the regression approach outperforms the classification approach in predicting student performance. The classification approach is currently widely used in student performance prediction. This research also explores the possibility of using synthetic student data to augment small educational datasets. The course and evaluation system differ among different regions, making student data hard to collect or merge. Augmenting small student datasets with synthetic data may help educators to better evaluate their teaching skills. This research shows that a regression approach using synthetic data improves the prediction accuracy by up to 21.9%, 15.6%, and 6.6%, respectively, using SVM, NN, and RF.
Keywords