Journal of Big Data (Jan 2022)

Supervised machine learning predictive analytics for alumni income

  • Daniela A. Gomez-Cravioto,
  • Ramon E. Diaz-Ramos,
  • Neil Hernandez-Gress,
  • Jose Luis Preciado,
  • Hector G. Ceballos

DOI
https://doi.org/10.1186/s40537-022-00559-6
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 31

Abstract

Read online

Abstract Background This paper explores machine learning algorithms and approaches for predicting alum income to obtain insights on the strongest predictors and a ‘high’ earners’ class. Methods It examines the alum sample data obtained from a survey from a multicampus Mexican private university. Survey results include 17,898 and 12,275 observations before and after cleaning and pre-processing, respectively. The dataset comprises income values and a large set of independent demographical attributes of former students. We conduct an in-depth analysis to determine whether the accuracy of traditional algorithms can be improved with a data science approach. Furthermore, we present insights on patterns obtained using explainable artificial intelligence techniques. Results Results show that the machine learning models outperformed the parametric models of linear and logistic regression, in predicting alum’s current income with statistically significant results (p < 0.05) in three different tasks. Moreover, the later methods were found to be the most accurate in predicting the alum’s first income after graduation. Conclusion We identified that age, gender, working hours per week, first income and variables related to the alum’s job position and firm contributed to explaining their current income. Findings indicated a gender wage gap, suggesting that further work is needed to enable equality.

Keywords