Mathematics (Jun 2024)
Improving the Automatic Detection of Dropout Risk in Middle and High School Students: A Comparative Study of Feature Selection Techniques
Abstract
The dropout rate in underdeveloped and emerging countries is a pressing social issue, as highlighted by studies conducted by The Organization for Economic Co-operation and Development. This study compares five feature selection techniques to address this challenge and improve the automatic detection of dropout risk. The methodological design involves three distinct phases: data preparation, feature selection, and model evaluation utilizing machine learning algorithms. The results demonstrate that (1) the top features identified by feature selection techniques, i.e., those constructed through feature engineering, proved to be among the most effective in classifying student dropout; (2) the F-score of the best model increased by 5% with feature selection techniques; and (3) depending on the type of feature selection, the performance of the machine learning algorithm can vary, potentially increasing or decreasing based on the sensitivity of features with higher noise. At the same time, metaheuristic algorithms demonstrated significant precision improvements, but there was a risk of increasing errors and reducing recall.
Keywords