Computers and Education: Artificial Intelligence (Dec 2024)
Advancing student outcome predictions through generative adversarial networks
Abstract
Predicting student outcomes is essential in educational analytics for creating personalised learning experiences. The effectiveness of these predictive models relies on having access to sufficient and accurate data. However, privacy concerns and the lack of student consent often restrict data collection, limiting the applicability of predictive models. To tackle this obstacle, we employ Generative Adversarial Networks, a type of Generative AI, to generate tabular data replicating and enlarging the dimensions of two distinct publicly available student datasets. The ‘Math dataset’ has 395 observations and 33 features, whereas the ‘Exam dataset’ has 1000 observations and 8 features. Using advanced Python libraries, Conditional Tabular Generative Adversarial Networks and Copula Generative Adversarial Networks, our methodology consists of two phases. First, a mirroring approach where we produce synthetic data matching the volume of the real datasets, focusing on privacy and evaluating predictive accuracy. Second, augmenting the real datasets with newly created synthetic observations to fill gaps in datasets that lack student data. We validate the synthetic data before employing these approaches using Correlation Analysis, Density Analysis, Correlation Heatmaps, and Principal Component Analysis. We then compare the predictive accuracy of whether students will pass or fail their exams across original, synthetic, and augmented datasets. Employing Feedforward Neural Networks, Convolutional Neural Networks, and Gradient-boosted Neural Networks, and using Bayesian optimisation for hyperparameter tuning, this research methodically examines the impact of synthetic data on prediction accuracy. We implement and optimize these models using Python. Our mirroring approach aims to achieve accuracy rates that closely align with the original data. Meanwhile, our augmenting approach seeks to reach a slightly higher accuracy level than when solely learning from the original data. Our findings provide actionable insights into leveraging advanced Generative AI techniques to enhance educational outcomes and meet our objectives successfully.