BMC Medical Education (Jan 2024)

Predicting students’ academic progress and related attributes in first-year medical students: an analysis with artificial neural networks and Naïve Bayes

  • Diego Monteverde-Suárez,
  • Patricia González-Flores,
  • Roberto Santos-Solórzano,
  • Manuel García-Minjares,
  • Irma Zavala-Sierra,
  • Verónica Luna de la Luz,
  • Melchor Sánchez-Mendiola

DOI
https://doi.org/10.1186/s12909-023-04918-6
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Dropout and poor academic performance are persistent problems in medical schools in emerging economies. Identifying at-risk students early and knowing the factors that contribute to their success would be useful for designing educational interventions. Educational Data Mining (EDM) methods can identify students at risk of poor academic progress and dropping out. The main goal of this study was to use machine learning models, Artificial Neural Networks (ANN) and Naïve Bayes (NB), to identify first year medical students that succeed academically, using sociodemographic data and academic history. Methods Data from seven cohorts (2011 to 2017) of admitted medical students to the National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City were analysed. Data from 7,976 students (2011 to 2017 cohorts) of the program were included. Information from admission diagnostic exam results, academic history, sociodemographic characteristics and family environment was used. The main dataset included 48 variables. The study followed the general knowledge discovery process: pre-processing, data analysis, and validation. Artificial Neural Networks (ANN) and Naïve Bayes (NB) models were used for data mining analysis. Results ANNs models had slightly better performance in accuracy, sensitivity, and specificity. Both models had better sensitivity when classifying regular students and better specificity when classifying irregular students. Of the 25 variables with highest predictive value in the Naïve Bayes model, percentage of correct answers in the diagnostic exam was the best variable. Conclusions Both ANN and Naïve Bayes methods can be useful for predicting medical students’ academic achievement in an undergraduate program, based on information of their prior knowledge and socio-demographic factors. Although ANN offered slightly superior results, Naïve Bayes made it possible to obtain an in-depth analysis of how the different variables influenced the model. The use of educational data mining techniques and machine learning classification techniques have potential in medical education.

Keywords