Optimizing learning outcomes: a deep dive into hybrid AI models for adaptive educational feedback

Hafiz Muhammad Qadir; M. Taseer Suleman; Rafaqat Alam Khan; Muhammad Sohaib; Md Junayed Hasan; Syed Abid Hussain

doi:10.1186/s40537-025-01187-6

Journal of Big Data (Jun 2025)

Optimizing learning outcomes: a deep dive into hybrid AI models for adaptive educational feedback

Hafiz Muhammad Qadir,
M. Taseer Suleman,
Rafaqat Alam Khan,
Muhammad Sohaib,
Md Junayed Hasan,
Syed Abid Hussain

Affiliations

Hafiz Muhammad Qadir: Department of Software Engineering, Lahore Garrison University
M. Taseer Suleman: Department of Computer Science, Bahria University Lahore Campus
Rafaqat Alam Khan: Department of Software Engineering, Lahore Garrison University
Muhammad Sohaib: School of Computer Science and Technology, Zhejiang Normal University
Md Junayed Hasan: Dataxense
Syed Abid Hussain: Department of Computer Science and Engineering, Bakhtar University

DOI: https://doi.org/10.1186/s40537-025-01187-6
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 26

Abstract

Read online

Abstract Accurate prediction of student performance is essential for the creation of adaptive learning frameworks and the best utilization of educational strategies. In this work, we apply ensemble learning and neural networks to investigate data from multiple sources about students, two real educational datasets from Kaggle, and two synthetically generated datasets. A Python-based generative script was used to create one synthetic dataset; another synthetic dataset is created by augmenting a smaller Kaggle dataset while keeping its original statistical distribution. The Integrated Synthetic Data will make the model more robust, mitigate class imbalance, and generalize predictively in a much better way across heterogeneous educational data. In this paper, we implement several ensemble models-AdaBoost, Gradient Boosting, XGBoost, LightGBM, and CatBoost-and deep learning architectures such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Networks (RNN). These models are evaluated using accuracy, precision, recall, F1-score, and ROC-AUC to assess their predictive effectiveness. Experimental results demonstrate that CatBoost outperforms other ensemble models with an accuracy of 0.7143 and an F1-score of 0.7338, while CNN achieves the highest performance for sequential data (accuracy: 0.6786). ROC-AUC analysis confirms CatBoost and XGBoost as top-performing classifiers, while CNN and DNN exhibit superior capability in handling temporal patterns. The study highlights the impact of dataset augmentation and synthetic data generation on improving predictive accuracy in educational data mining, reinforcing the importance of data-centric approaches for building intelligent, and evidence-driven educational systems. The learning feedback has been made available via a user-friendly webserver at: https://khan-learning-feedback.streamlit.app/ .

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal