BMC Medical Research Methodology (Jan 2024)

Gaussian process emulation to improve efficiency of computationally intensive multidisease models: a practical tutorial with adaptable R code

  • Sharon Jepkorir Sawe,
  • Richard Mugo,
  • Marta Wilson-Barthes,
  • Brianna Osetinsky,
  • Stavroula A. Chrysanthopoulou,
  • Faith Yego,
  • Ann Mwangi,
  • Omar Galárraga

DOI
https://doi.org/10.1186/s12874-024-02149-x
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background The rapidly growing burden of non-communicable diseases (NCDs) among people living with HIV in sub-Saharan Africa (SSA) has expanded the number of multidisease models predicting future care needs and health system priorities. Usefulness of these models depends on their ability to replicate real-life data and be readily understood and applied by public health decision-makers; yet existing simulation models of HIV comorbidities are computationally expensive and require large numbers of parameters and long run times, which hinders their utility in resource-constrained settings. Methods We present a novel, user-friendly emulator that can efficiently approximate complex simulators of long-term HIV and NCD outcomes in Africa. We describe how to implement the emulator via a tutorial based on publicly available data from Kenya. Emulator parameters relating to incidence and prevalence of HIV, hypertension and depression were derived from our own agent-based simulation model and other published literature. Gaussian processes were used to fit the emulator to simulator estimates, assuming presence of noise for design points. Bayesian posterior predictive checks and leave-one-out cross validation confirmed the emulator’s descriptive accuracy. Results In this example, our emulator resulted in a 13-fold (95% Confidence Interval (CI): 8–22) improvement in computing time compared to that of more complex chronic disease simulation models. One emulator run took 3.00 seconds (95% CI: 1.65–5.28) on a 64-bit operating system laptop with 8.00 gigabytes (GB) of Random Access Memory (RAM), compared to > 11 hours for 1000 simulator runs on a high-performance computing cluster with 1500 GBs of RAM. Pareto k estimates were 10 year) period, estimate longer-term prevalence of other co-occurring conditions (e.g., postpartum depression among women living with HIV), and project the impact of nationally-prioritized interventions such as national health insurance schemes and differentiated care models.

Keywords