Proceedings of the XXth Conference of Open Innovations Association FRUCT (May 2023)
Model for the Prediction of Dropout in Higher Education in Peru applying Machine Learning Algorithms: Random Forest, Decision Tree, Neural Network and Support Vector Machine
Abstract
University dropout is a problem that not only affects students, but also families, universities, society, and others. This problem has a global character, so it is common to identify it in different parts of the world. However, there are few solutions that efficiently take advantage of available technology and information. Therefore, this study implements a predictive analysis model to identify students at risk of dropout in Peruvian universities and the variables that influence it. For this purpose, the Cross Industry Standard Process for Data Mining (CRISP - DM) methodology is used to develop the model and four Machine Learning algorithms. The methodology consists of five phases: business understanding, data understanding, data preparation, modeling, and evaluation. The experiment was carried out by conducting a survey to 385 students from different public and private universities in Peru, where cognitive, affective, family environment, pre-university, career and university variables were considered. The results showed that the most influential variables in the prediction of university dropout were "age", "term" and the student's "financing method". We also found that the Random Forest algorithm obtained the best performance, with an AUC of 0.9623 in the prediction of college dropout.
Keywords