Scientific Reports (Dec 2024)

Cytokine profiles as predictors of HIV incidence using machine learning survival models and statistical interpretable techniques

  • Sarah Ogutu,
  • Mohanad Mohammed,
  • Henry Mwambi

DOI
https://doi.org/10.1038/s41598-024-81510-y
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 23

Abstract

Read online

Abstract HIV remains a critical global health issue, with an estimated 39.9 million people living with the virus worldwide by the end of 2023 (according to WHO). Although the epidemic’s impact varies significantly across regions, Africa remains the most affected. In the past decade, considerable efforts have focused on developing preventive measures, such as vaccines and pre-exposure prophylaxis, to combat sexually transmitted HIV. Recently, cytokine profiles have gained attention as potential predictors of HIV incidence due to their involvement in immune regulation and inflammation, presenting new opportunities to enhance preventative strategies. However, the high-dimensional, time-varying nature of cytokine data collected in clinical research, presents challenges for traditional statistical methods like the Cox proportional hazards (PH) model to effectively analyze survival data related to HIV. Machine learning (ML) survival models offer a robust alternative, especially for addressing the limitations of the PH model’s assumptions. In this study, we applied survival support vector machine (SSVM) and random survival forest (RSF) models using changes or means in cytokine levels as predictors to assess their association with HIV incidence, evaluate variable importance, measure predictive accuracy using the concordance index (C-index) and integrated Brier score (IBS) and interpret the model’s predictions using Shapley additive explanations (SHAP) values. Our results indicated that RSFs models outperformed SSVMs models, with the difference covariate model performing better than the mean covariate model. The highest C-index for SSVM was 0.7180 under the difference covariate model, while for RSF, it reached 0.8801 under the difference covariate model using the log-rank split rule. Key cytokines identified as positive predictors of HIV incidence included TNF-A, BASIC-FGF, IL-5, MCP-3, and EOTAXIN, while 29 cytokines were negative predictors. Baseline factors such as condom use frequency, treatment status, number of partners, and sexual activity also emerged as significant predictors. This study underscored the potential of cytokine profiles for predicting HIV incidence and highlighted the advantages of RSFs models in analyzing high-dimensional, time-varying data over SSVMs. It further through ablation studies emphasized the importance of selecting key features within mean and difference based covariate models to achieve an optimal balance between model complexity and predictive accuracy.

Keywords