JTAM (Jurnal Teori dan Aplikasi Matematika) (Jul 2024)
Effectiveness of Machine Learning Models with Bayesian Optimization-Based Method to Identify Important Variables that Affect GPA
Abstract
To produce superior human resources, the SPs-IPB Master Program must consider the factors influencing the GPA in the student selection process. The method that can be used to identify these factors is a machine learning algorithm. This paper applies the random forest and XGBoost algorithms to identify significant variables that affect GPA. In the evaluation process, the default model will be compared with the model resulting from Bayesian and random search optimization. Bayesian optimization is a method for optimizing hyperparameters that combines information from previous iterations to improve estimates. It is highly efficient in terms of computing time. Based on a balanced accuracy and sensitivity metrics average, Bayesian optimization produces a model superior to the default model and more time-efficient than random search optimization. XGBoost sensitivity metric is 25% better than random forest. However, random forest is 19% better in accuracy and 30% in specificity. Important variables are obtained from the information gain value when splitting the tree nodes formed. According to the best random forest and XGBoost model, variables that have the most influence on students' GPA are Undergraduate University Status (X8) and Undergraduate University (X6). Meanwhile, the variables with the smallest influence are Gender (X4) and Enrollment (X9).
Keywords