Scientific Reports (Jul 2024)

Regularized ensemble learning for prediction and risk factors assessment of students at risk in the post-COVID era

  • Zardad Khan,
  • Amjad Ali,
  • Dost Muhammad Khan,
  • Saeed Aldahmani

DOI
https://doi.org/10.1038/s41598-024-66894-1
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The COVID-19 pandemic has had a significant impact on students’ academic performance. The effects of the pandemic have varied among students, but some general trends have emerged. One of the primary challenges for students during the pandemic has been the disruption of their study habits. Students getting used to online learning routines might find it even more challenging to perform well in face to face learning. Therefore, assessing various potential risk factors associated with students low performance and its prediction is important for early intervention. As students’ performance data encompass diverse behaviors, standard machine learning methods find it hard to get useful insights for beneficial practical decision making and early interventions. Therefore, this research explores regularized ensemble learning methods for effectively analyzing students’ performance data and reaching valid conclusions. To this end, three pruning strategies are implemented for the random forest method. These methods are based on out-of-bag sampling, sub-sampling and sub-bagging. The pruning strategies discard trees that are adversely affected by the unusual patterns in the students data forming forests of accurate and diverse trees. The methods are illustrated on an example data collected from university students currently studying on campus in a face-to-face modality, who studied during the COVID-19 pandemic through online learning. The suggested methods outperform all the other methods considered in this paper for predicting students at the risk of academic failure. Moreover, various factors such as class attendance, students interaction, internet connectivity, pre-requisite course(s) during the restrictions, etc., are identified as the most significant features.

Keywords