PLOS Global Public Health (Jan 2022)

Machine learning with routine electronic medical record data to identify people at high risk of disengagement from HIV care in Tanzania.

  • Carolyn A Fahey,
  • Linqing Wei,
  • Prosper F Njau,
  • Siraji Shabani,
  • Sylvester Kwilasa,
  • Werner Maokola,
  • Laura Packel,
  • Zeyu Zheng,
  • Jingshen Wang,
  • Sandra I McCoy

DOI
https://doi.org/10.1371/journal.pgph.0000720
Journal volume & issue
Vol. 2, no. 9
p. e0000720

Abstract

Read online

Machine learning methods for health care delivery optimization have the potential to improve retention in HIV care, a critical target of global efforts to end the epidemic. However, these methods have not been widely applied to medical record data in low- and middle-income countries. We used an ensemble decision tree approach to predict risk of disengagement from HIV care (missing an appointment by ≥28 days) in Tanzania. Our approach used routine electronic medical records (EMR) from the time of antiretroviral therapy (ART) initiation through 24 months of follow-up for 178 adults (63% female). We compared prediction accuracy when using EMR-based predictors alone and in combination with sociodemographic survey data collected by a research study. Models that included only EMR-based indicators and incorporated changes across past clinical visits achieved a mean accuracy of 75.2% for predicting risk of disengagement in the next 6 months, with a mean sensitivity of 54.7% for targeting the 30% highest-risk individuals. Additionally including survey-based predictors only modestly improved model performance. The most important variables for prediction were time-varying EMR indicators including changes in treatment status, body weight, and WHO clinical stage. Machine learning methods applied to existing EMR data in resource-constrained settings can predict individuals' future risk of disengagement from HIV care, potentially enabling better targeting and efficiency of interventions to promote retention in care.