IEEE Access (Jan 2024)

Identifying and Understanding Student Dropouts Using Metaheuristic Optimized Classifiers and Explainable Artificial Intelligence Techniques

  • Radic Goran,
  • Luka Jovanovic,
  • Nebojsa Bacanin,
  • Milos S. Stankovic,
  • Vladimir Simic,
  • Milos Antonijevic,
  • Miodrag Zivkovic

DOI
https://doi.org/10.1109/ACCESS.2024.3446653
Journal volume & issue
Vol. 12
pp. 122377 – 122400

Abstract

Read online

This study addresses the pressing issue of student dropout in higher education institutions and explores the potential of artificial intelligence (AI) to mitigate this challenge. Student dropout is a complex phenomenon influenced by diverse factors, including internal and external, student characteristics and skills. To enhance retention strategies, it is crucial to identify the nuanced reasons behind dropout decisions, which often go unnoticed by university staff. Therefore, this study investigates the integration of metaheuristic optimization techniques with Adaptive Boosting (AdaBoost) and eXtreme Gradient Boosting (XGBoost) machine learning (ML) models for student dropout identification. By leveraging these well-known ML techniques, the goal is to enhance the accuracy and reliability of dropout predictions in terms of standard classification metrics. Further, by harnessing the exploration and exploitation capabilities of metaheuristics, the study aims to fine-tune both models, thereby increasing their accuracy and robustness in identifying at-risk students. Additionally, to address limitations of existing metaheuristics, a modified version of recently proposed Sinh Cosh Optimizer (SCHO) was developed, that manages to generate well-performing XGBoost and AdaBoost models for students dropout prediction. The study demonstrates that both tuned models can effectively identify at-risk students, providing valuable insights for targeted educational support initiatives. Three experimental evaluations, two with binary and one with multi-class student dropout classification, are conducted on real-world datasets along with rigid comparative analysis and statistical validation with other cutting-edge metaheuristics. According to experimental outcomes, proposed methodology outscores significantly other approaches in terms of performance. Finally, a comprehensive analysis of influential factors was performed using SHapley Additive exPlanations (SHAP) and Shapley Additive Global importancE (SAGE) explainable AI techniques on the best generated models to identify the factors that most significantly influence dropout decisions. This work contributes to advancing AI applications in higher education, providing insights for policymakers and institutions to design targeted interventions for student retention, ultimately enhancing the overall success and effectiveness of higher education systems.

Keywords