PLoS ONE (Jan 2023)

Constructing training set using distance between learnt graphical models of time series data on patient physiology, to predict disease scores.

  • Dalia Chakrabarty,
  • Kangrui Wang,
  • Gargi Roy,
  • Akash Bhojgaria,
  • Chuqiao Zhang,
  • Jiri Pavlu,
  • Joydeep Chakrabartty

DOI
https://doi.org/10.1371/journal.pone.0292404
Journal volume & issue
Vol. 18, no. 10
p. e0292404

Abstract

Read online

Interventional endeavours in medicine include prediction of a score that parametrises a new subject's susceptibility to a given disease, at the pre-onset stage. Here, for the first time, we provide reliable learning of such a score in the context of the potentially-terminal disease VOD, that often arises after bone marrow transplants. Indeed, the probability of surviving VOD, is correlated with early intervention. In our work, the VOD-score of each patient in a retrospective cohort, is defined as the distance between the (posterior) probability of a random graph variable-given the inter-variable partial correlation matrix of the time series data on variables that represent different aspects of patient physiology-and that given such time series data of an arbitrarily-selected reference patient. Such time series data is recorded from a pre-transplant to a post-transplant time, for each patient in this cohort, though the data available for distinct patients bear differential temporal coverage, owing to differential patient longevities. Each graph is a Soft Random Geometric Graph drawn in a probabilistic metric space, and the computed inter-graph distance is oblivious to the length of the time series data. The VOD-score learnt in this way, and the corresponding pre-transplant parameter vector of each patient in this retrospective cohort, then results in the training data, using which we learn the function that takes VOD-score as its input, and outputs the vector of pre-transplant parameters. We model this function with a vector-variate Gaussian Process, the covariance structure of which is kernel parametrised. Such modelling is easier than if the score variable were the output. Then for any prospective patient, whose pre-transplant variables are known, we learn the VOD-score (and the hyperparameters of the covariance kernel), using Markov Chain Monte Carlo based inference.