BMJ Nutrition, Prevention & Health ()

Random forest approach for determining risk prediction and predictive factors of type 2 diabetes: large-scale health check-up data in Japan

  • Zentaro Yamagata,
  • Hiroshi Yokomichi,
  • Tadao Ooka,
  • Hisashi Johno,
  • Kazunori Nakamoto,
  • Yoshioki Yoda

DOI
https://doi.org/10.1136/bmjnph-2020-000200

Abstract

Read online

Introduction Early intervention in type 2 diabetes can prevent exacerbation of insulin resistance. More effective interventions can be implemented by early and precise prediction of the change in glycated haemoglobin A1c (HbA1c). Artificial intelligence (AI), which has been introduced into various medical fields, may be useful in predicting changes in HbA1c. However, the inability to explain the predictive factors has been a problem in the use of deep learning, the leading AI technology. Therefore, we applied a highly interpretable AI method, random forest (RF), to large-scale health check-up data and examined whether there was an advantage over a conventional prediction model.Research design and methods This study included a cumulative total of 42 908 subjects not receiving treatment for diabetes with an HbA1c <6.5%. The objective variable was the change in HbA1c in the next year. Each prediction model was created with 51 health-check items and part of their change values from the previous year. We used two analytical methods to compare the predictive powers: RF as a new model and multivariate logistic regression (MLR) as a conventional model. We also created models excluding the change values to determine whether it positively affected the predictions. In addition, variable importance was calculated in the RF analysis, and standard regression coefficients were calculated in the MLR analysis to identify the predictors.Results The RF model showed a higher predictive power for the change in HbA1c than MLR in all models. The RF model including change values showed the highest predictive power. In the RF prediction model, HbA1c, fasting blood glucose, body weight, alkaline phosphatase and platelet count were factors with high predictive power.Conclusions Correct use of the RF method may enable highly accurate risk prediction for the change in HbA1c and may allow the identification of new diabetes risk predictors.