BMC Medical Informatics and Decision Making (Sep 2018)

A comparative study of logistic regression based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients

  • Kuteesa R. Bisaso,
  • Susan A. Karungi,
  • Agnes Kiragga,
  • Jackson K. Mukonzo,
  • Barbara Castelnuovo

DOI
https://doi.org/10.1186/s12911-018-0659-x
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Treatment with effective antiretroviral therapy (ART) lowers morbidity and mortality among HIV positive individuals. Effective highly active antiretroviral therapy (HAART) should lead to undetectable viral load within 6 months of initiation of therapy. Failure to achieve and maintain viral suppression may lead to development of resistance and increase the risk of viral transmission. In this paper three logistic regression based machine learning approaches are developed to predict early virological outcomes using easily measurable baseline demographic and clinical variables (age, body weight, sex, TB disease status, ART regimen, viral load, CD4 count). The predictive performance and generalizability of the approaches are compared. Methods The multitask temporal logistic regression (MTLR), patient specific survival prediction (PSSP) and simple logistic regression (SLR) models were developed and validated using the IDI research cohort data and predictive performance tested on an external dataset from the EFV cohort. The model calibration and discrimination plots, discriminatory measures (AUROC, F1) and overall predictive performance (brier score) were assessed. Results The MTLR model outperformed the PSSP and SLR models in terms of goodness of fit (RMSE = 0.053, 0.1, and 0.14 respectively), discrimination (AUROC = 0.92, 0.75 and 0.53 respectively) and general predictive performance (Brier score= 0.08, 0.19, 0.11 respectively). The predictive importance of variables varied with time after initiation of ART. The final MTLR model accurately (accuracy = 92.9%) predicted outcomes in the external (EFV cohort) dataset with satisfactory discrimination (0.878) and a low (6.9%) false positive rate. Conclusion Multitask Logistic regression based models are capable of accurately predicting early virological suppression using readily available baseline demographic and clinical variables and could be used to derive a risk score for use in resource limited settings.

Keywords