BMC Medical Informatics and Decision Making (May 2025)

Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting

  • Tamrat Endebu,
  • Girma Taye,
  • Wakgari Deressa

DOI
https://doi.org/10.1186/s12911-025-03030-7
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Despite the global commitment to ending AIDS by 2030, the loss of follow-up (LTFU) in HIV care remains a significant challenge. To address this issue, a data-driven clinical decision tool is crucial for identifying patients at greater risk of LTFU and facilitating personalized and proactive interventions. This study aimed to develop a prediction model to assess the future risk of LTFU in HIV care in Ethiopia. Methods The study used a retrospective design in which machine learning (ML) methods were applied to the electronic medical records (EMRs) data of adult HIV-positive individuals who were newly enrolled in antiretroviral therapy between July 2019 and April 2024. The data were collected across eight randomly selected high-volume healthcare facilities. Six supervised ML classifiers—J48 decision tree, random forest, K-nearest neighbors, support vector machine, logistic regression, and naïve Bayes—were utilized for training via Weka 3.8.6 software. The performance of each algorithm was evaluated through a 10-fold cross-validation approach. Algorithm performance was compared via the corrected resampled t test (p < 0.05), and decision curve analysis (DCA) was used to assess the model’s clinical utility. Results A total of 3,720 individuals’ EMR data were analyzed, with 2,575 (69.2%) classified as not LTFU and 1,145 (30.8%) classified as LTFU. On the basis of the ML feature selection process, six strong predictors of LTFU were identified: differentiated service delivery model, adherence, tuberculosis preventive therapy, follow-up period, nutritional status, and address information. The random forest algorithm showed superior performance, with an accuracy of 84.2%, a sensitivity of 82.4%, a specificity of 85.7%, a precision of 83.7%, an F1 score of 83.1%, and an area under the curve of 89.5%. The model demonstrated greater clinical utility, offering greater net benefit than both the ‘intervention for all’ approach and the ‘intervention for none’ approach, particularly at threshold probabilities of 10% and above. Conclusions This study developed a machine learning-based predictive model for assessing the future risk of LTFU in HIV care within low-resource settings. Notably, the model built via the random forest algorithm exhibited high accuracy and strong discriminative performance, highlighting its positive net benefit for clinical applications. Furthermore, ongoing external validation across diverse populations is important to ensure the model’s reliability and generalizability.

Keywords