PLoS ONE (Jan 2020)

Application of explainable ensemble artificial intelligence model to categorization of hemodialysis-patient and treatment using nationwide-real-world data in Japan.

  • Eiichiro Kanda,
  • Bogdan I Epureanu,
  • Taiji Adachi,
  • Yuki Tsuruta,
  • Kan Kikuchi,
  • Naoki Kashihara,
  • Masanori Abe,
  • Ikuto Masakane,
  • Kosaku Nitta

DOI
https://doi.org/10.1371/journal.pone.0233491
Journal volume & issue
Vol. 15, no. 5
p. e0233491

Abstract

Read online

BACKGROUND:Although dialysis patients are at a high risk of death, it is difficult for medical practitioners to simultaneously evaluate many inter-related risk factors. In this study, we evaluated the characteristics of hemodialysis patients using machine learning model, and its usefulness for screening hemodialysis patients at a high risk of one-year death using the nation-wide database of the Japanese Society for Dialysis Therapy. MATERIALS AND METHODS:The patients were separated into two datasets (n = 39,930, 39,930, respectively). We categorized hemodialysis patients in Japan into new clusters generated by the K-means clustering method using the development dataset. The association between a cluster and the risk of death was evaluated using multivariate Cox proportional hazards models. Then, we developed an ensemble model composed of the clusters and support vector machine models in the model development phase, and compared the accuracy of the prediction of mortality between the machine learning models in the model validation phase. RESULTS:Average age of the subjects was 65.7±12.2 years; 32.7% had diabetes mellitus. The five clusters clearly distinguished the groups on the basis of their characteristics: Cluster 1, young male, and chronic glomerulonephritis; Cluster 2, female, and chronic glomerulonephritis; Cluster 3, diabetes mellitus; Cluster 4, elderly and nephrosclerosis; Cluster 5, elderly and protein energy wasting. These clusters were associated with the risk of death; Cluster 5 compared with Cluster 1, hazard ratio 8.86 (95% CI 7.68, 10.21). The accuracy of the ensemble model for the prediction of 1-year death was 0.948 and higher than those of logistic regression model (0.938), support vector machine model (0.937), and deep learning model (0.936). CONCLUSIONS:The clusters clearly categorized patient on their characteristics, and reflected their prognosis. Our real-world-data-based machine learning system is applicable to identifying high-risk hemodialysis patients in clinical settings, and has a strong potential to guide treatments and improve their prognosis.