Applied Sciences (Jan 2023)
All-Year Dropout Prediction Modeling and Analysis for University Students
Abstract
The core of dropout prediction lies in the selection of predictive models and feature tables. Machine learning models have been shown to predict student dropouts accurately. Because students may drop out of school in any semester, the student history data recorded in the academic management system would have a different length. The different length of student history data poses a challenge for generating feature tables. Most current studies predict student dropouts in the first academic year and therefore avoid discussing this issue. The central assumption of these studies is that more than 50% of dropouts will leave school in the first academic year. However, in our study, we found the distribution of dropouts is evenly distributed in all academic years based on the dataset from a Korean university. This result suggests that Korean students’ data characteristics included in our dataset may differ from those of other developed countries. More specifically, the result that dropouts are evenly distributed throughout the academic years indicates the importance of a dropout prediction for the students in any academic year. Based on this, we explore the universal feature tables applicable to dropout prediction for university students in any academic year. We design several feature tables and compare the performance of six machine learning models on these feature tables. We find that the mean value-based feature table exhibits better generalization, and the model based on the gradient boosting technique performs better than other models. This result reveals the importance of students’ historical information in predicting dropout.
Keywords