Blood Advances (Nov 2019)

Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation

  • Yasuyuki Arai,
  • Tadakazu Kondo,
  • Kyoko Fuse,
  • Yasuhiko Shibasaki,
  • Masayoshi Masuko,
  • Junichi Sugita,
  • Takanori Teshima,
  • Naoyuki Uchida,
  • Takahiro Fukuda,
  • Kazuhiko Kakihana,
  • Yukiyasu Ozawa,
  • Tetsuya Eto,
  • Masatsugu Tanaka,
  • Kazuhiro Ikegame,
  • Takehiko Mori,
  • Koji Iwato,
  • Tatsuo Ichinohe,
  • Yoshinobu Kanda,
  • Yoshiko Atsuta

Journal volume & issue
Vol. 3, no. 22
pp. 3626 – 3634

Abstract

Read online

Abstract: Acute graft-versus-host disease (aGVHD) is 1 of the critical complications that often occurs following allogeneic hematopoietic stem cell transplantation (HSCT). Thus far, various types of prediction scores have been created using statistical calculations. The primary objective of this study was to establish and validate the machine learning–dependent index for predicting aGVHD. This was a retrospective cohort study that involved analyzing databases of adult HSCT patients in Japan. The alternating decision tree (ADTree) machine learning algorithm was applied to develop models using the training cohort (70%). The ADTree algorithm was confirmed using the hazard model on data from the validation cohort (30%). Data from 26 695 HSCT patients transplanted from allogeneic donors between 1992 and 2016 were included in this study. The cumulative incidence of aGVHD was 42.8%. Of >40 variables considered, 15 were adapted into a model for aGVHD prediction. The model was tested in the validation cohort, and the incidence of aGVHD was clearly stratified according to the categorized ADTree scores; the cumulative incidence of aGVHD was 29.0% for low risk and 58.7% for high risk (hazard ratio, 2.57). Predicting scores for aGVHD also demonstrated the link between the risk of development aGVHD and overall survival after HSCT. The machine learning algorithms produced clinically reasonable and robust risk stratification scores. The relatively high reproducibility and low impacts from the interactions among the variables indicate that the ADTree algorithm, along with the other data-mining approaches, may provide tools for establishing risk score.