Informatics in Medicine Unlocked (Jan 2023)

Utility of a machine-guided tool for assessing risk behaviour associated with contracting HIV in three sites in South Africa

  • M. Majam,
  • B. Segal,
  • J. Fieggen,
  • Eli Smith,
  • L. Hermans,
  • L. Singh,
  • M. Phatsoane,
  • L. Arora,
  • S.T. Lalla-Edward

Journal volume & issue
Vol. 37
p. 101192

Abstract

Read online

Introduction: Digital data collection and the associated mobile health technologies have allowed for the recent exploration of artificial intelligence as a tool for combatting the HIV epidemic. Machine learning has been found to be useful both in HIV risk prediction and as a decision support tool for guiding pre-exposure prophylaxis (PrEP) treatment. This paper reports data from two sequential studies evaluating the viability of using machine learning to predict the susceptibility of adults to HIV infection using responses from a digital survey deployed in a high burden, low-resource setting. Methods: 1036 and 593 participants were recruited across two trials. The first trial was a cross-sectional study in one location and the second trial was a cohort study across three trial sites. The data from the studies were merged, partitioned using standard techniques, and then used to train and evaluate multiple different machine learning models and select and evaluate a final model. Variable importance estimates were calculated using the PIMP and SHAP methodologies. Results: Characteristics associated with HIV were consistent across both studies. Overall, HIV positive patients had a higher median age (34 [IQR: 29–39] vs 26 [IQR 22–33], p < 0.001), and were more likely to be female (155/703 [22%] vs 107/927 [12%], p < 0.001). HIV positive participants also had more commonly gone a year or more since their last HIV test (183/262 [70%] vs 540/1368 [39%], p < 0.001) and were less likely to report consistent condom usage (113/262 [43%] vs 758/1368 [55%], p < 0.001). Patients who reported TB symptoms were more likely to be HIV positive. The trained models had accuracy values (AUROCs) ranging from 78.5% to 82.8%. A boosted tree model performed best with a sensitivity of 84% (95% CI 72–92), specificity of 71% (95% CI 67–76), and a negative predictive value of 95% (95% CI 93–96) in a hold-out dataset. Age, duration since last HIV test, and number of male sexual partners were consistently three of the four most important variables across both variable importance estimates. Conclusions: This study has highlighted the synergies present between mobile health and machine learning in HIV. It has been demonstrated that a viable ML model can be built using digital survey data from an low-middle income setting with potential utility in directing health resources.

Keywords