Scientific Reports (Aug 2022)

Prediction of mortality risk of health checkup participants using machine learning-based models: the J-SHC study

  • Kazuharu Kawano,
  • Yoichiro Otaki,
  • Natsuko Suzuki,
  • Shouichi Fujimoto,
  • Kunitoshi Iseki,
  • Toshiki Moriyama,
  • Kunihiro Yamagata,
  • Kazuhiko Tsuruya,
  • Ichiei Narita,
  • Masahide Kondo,
  • Yugo Shibagaki,
  • Masato Kasahara,
  • Koichi Asahi,
  • Tsuyoshi Watanabe,
  • Tsuneo Konta

DOI
https://doi.org/10.1038/s41598-022-18276-8
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Early detection and treatment of diseases through health checkups are effective in improving life expectancy. In this study, we compared the predictive ability for 5-year mortality between two machine learning-based models (gradient boosting decision tree [XGBoost] and neural network) and a conventional logistic regression model in 116,749 health checkup participants. We built prediction models using a training dataset consisting of 85,361 participants in 2008 and evaluated the models using a test dataset consisting of 31,388 participants from 2009 to 2014. The predictive ability was evaluated by the values of the area under the receiver operating characteristic curve (AUC) in the test dataset. The AUC values were 0.811 for XGBoost, 0.774 for neural network, and 0.772 for logistic regression models, indicating that the predictive ability of XGBoost was the highest. The importance rating of each explanatory variable was evaluated using the SHapley Additive exPlanations (SHAP) values, which were similar among these models. This study showed that the machine learning-based model has a higher predictive ability than the conventional logistic regression model and may be useful for risk assessment and health guidance for health checkup participants.