IEEE Access (Jan 2022)

An Explainable Model for Identifying At-Risk Student at Higher Education

  • Sarah Alwarthan,
  • Nida Aslam,
  • Irfan Ullah Khan

DOI
https://doi.org/10.1109/ACCESS.2022.3211070
Journal volume & issue
Vol. 10
pp. 107649 – 107668

Abstract

Read online

Nowadays, researchers from various fields have shown great interest in improving the quality of learning in educational institutes in order to improve student achievement and learning outcomes. The main objective of this study was to predict the at-risk student of failing the preparatory year at an early stage. This study applies several educational data mining algorithms including RF, ANN, and SVM to build three classification models to meet the objectives of this study. Moreover, different features selection methods namely RFE, and GA have been examined to find the best subset of the highest influential features. Furthermore, several sampling approaches are applied to balance the dataset used in this study, including SMOTE, and SMOTE-Tomek Link. Three datasets related to the preparatory year student from the humanities track at IAU were used in this study. The collected datasets are imbalanced datasets, SMOTE-Tomek Link technique has been used to balance the three proposed datasets. The results showed that RF outperformed other techniques as it records the highest performance for building the models. Moreover, RFE with Mutual Information finds the best subset of features to build the first model. Finally, this study not only developed several classification models to identify at-risk students, but it also went a step further by employing XAI techniques such as LIME, SHAP, and the global surrogate model to explain the proposed prediction models, explaining the output and highlighting the reasons for the student failure.

Keywords