Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines

Roberto Bertolini; Stephen J. Finch; Ross H. Nehm

Computers and Education: Artificial Intelligence (Jan 2022)

Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines

Roberto Bertolini,
Stephen J. Finch,
Ross H. Nehm

Affiliations

Roberto Bertolini: Department of Applied Mathematics and Statistics, Stony Brook University (SUNY), Math Tower, Room P-139A, Stony Brook, NY, 11794-3600, USA; Corresponding author.
Stephen J. Finch: Department of Applied Mathematics and Statistics, Stony Brook University (SUNY), Math Tower, Room P-139A, Stony Brook, NY, 11794-3600, USA
Ross H. Nehm: Department of Ecology and Evolution, Program in Science Education, Stony Brook University (SUNY), 650 Life Sciences Building, Stony Brook, NY, 11794-5245, USA

Journal volume & issue: Vol. 3
p. 100067

Abstract

Read online

Educators seek to develop accurate and timely prediction models to forecast student retention and attrition. Although prior studies have generated single point estimates to quantify predictive efficacy, much less education research has examined variability in student performance predictions using nonparametric bootstrap algorithms in data pipelines. In this study, bootstrapping was applied to examine performance variability among five data mining methods (DMMs) and four filter preprocessing feature selection techniques for forecasting course grades for 3225 students enrolled in an undergraduate biology class. While the median area under the curve (AUC) values obtained from bootstrapping were significantly lower than the AUC point estimates obtained without resampling, DMMs and feature selection techniques impacted variability in different ways. The ensemble technique elastic net regression (GLMNET) significantly outperformed all other DMMs and exhibited the least amount of variability in the AUC. However, all filter feature selection techniques significantly increased variability in student success predictions, compared to when this step was omitted from the data pipeline. We discuss the potential benefits and drawbacks of incorporating bootstrapping into prediction pipelines to track, monitor, and forecast classroom performance, as well as highlight the risks of only examining point estimates.

Published in Computers and Education: Artificial Intelligence

ISSN: 2666-920X (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/computers-and-education-artificial-intelligence

About the journal

Abstract

Keywords