Risk Management and Healthcare Policy (Nov 2019)

Machine Learning For Tuning, Selection, And Ensemble Of Multiple Risk Scores For Predicting Type 2 Diabetes

  • Liu Y,
  • Ye S,
  • Xiao X,
  • Sun C,
  • Wang G,
  • Wang G,
  • Zhang B

Journal volume & issue
Vol. Volume 12
pp. 189 – 198

Abstract

Read online

Yujia Liu,1 Shangyuan Ye,2 Xianchao Xiao,1 Chenglin Sun,1 Gang Wang,1 Guixia Wang,1 Bo Zhang3 1Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People’s Republic of China; 2Department of Population Medicine, Harvard Pilgrim Health Care and Harvard Medical School, Boston, MA, USA; 3Department of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USACorrespondence: Bo ZhangDepartment of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children’s Hospital and Harvard Medical School, 21 Autumn Street, Boston, MA 02115, USAEmail [email protected]   Guixia WangDepartment of Endocrinology and Metabolism, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, People’s Republic of ChinaEmail [email protected]: This study proposes the use of machine learning algorithms to improve the accuracy of type 2 diabetes predictions using non-invasive risk score systems.Methods: We evaluated and compared the prediction accuracies of existing non-invasive risk score systems using the data from the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals: A Longitudinal Study). Two simple risk scores were established on the bases of logistic regression. Machine learning techniques (ensemble methods) were used to improve prediction accuracies by combining the individual score systems.Results: Existing score systems from Western populations performed worse than the scores from Eastern populations in general. The two newly established score systems performed better than most existing scores systems but a little worse than the Chinese score system. Using ensemble methods with model selection algorithms yielded better prediction accuracy than all the simple score systems.Conclusion: Our proposed machine learning methods can be used to improve the accuracy of screening the undiagnosed type 2 diabetes and identifying the high-risk patients.Keywords: type 2 diabetes, risk score, machine learning, voting, stacking, prediction

Keywords