IEEE Access (Jan 2024)
Enhancing the Early Student Dropout Prediction Model Through Clustering Analysis of Students’ Digital Traces
Abstract
Educational Data Mining and learning analytics have gained significant prominence in recent years, garnering attention from researchers worldwide. This is primarily due to their potential to improve decision-making processes within higher education. This study utilizes Educational Data Mining to suggest a clustering-based approach for classifying students based on their academic performance, specifically focusing on their interactions within Learning Management Systems in blended learning settings. Log data from the Learning Management System Moodle platform was analyzed to determine if valuable insights could be derived from student activities without the need for personal or invasive data. Given the large dataset, Principal Component Analysis and Uniform Manifold Approximation and Projection were employed for dimensionality reduction. Three unsupervised learning models—BIRCH, DBSCAN, and GMM—were then applied to identify distinct student clusters. The results from the comparison revealed that the BIRCH algorithm was the most effective in accurately categorizing students based on their activities. Further analysis of identified clusters revealed a strong correlation between student engagement and academic outcomes, with two high-risk clusters demonstrating the highest dropout rates. To better understand the reasons behind the found clusters, the course duration was subsequently divided into intervals, and standard machine learning methods were applied to predict at-risk students over time, confirming that early identification of at-risk students using clustering is feasible. This study demonstrates the efficacy of non-invasive data collection and unsupervised learning models in predicting student success, emphasizing the importance of temporal and content-specific analysis. The proposed approach leverages clustering algorithms to enhance the understanding of student behavior by identifying patterns and grouping students with similar learning and engagement profiles. This supports the development of targeted interventions that are tailored to specific student needs, ultimately improving retention and success.
Keywords