Iranian Journal of Public Health (May 2022)

The Prediction Models for High-Risk Population of Stroke Based on Logistic Regressive Analysis and Lightgbm Algorithm Separately

  • Yicheng Xue,
  • Silong Chen,
  • Mengmeng Zhang,
  • Xiaojuan Cai,
  • Jialian Zheng,
  • Shihua Wang,
  • Yan Chen

DOI
https://doi.org/10.18502/ijph.v51i5.9415
Journal volume & issue
Vol. 51, no. 5

Abstract

Read online

Background: We aimed to investigate the high-risk factors of stroke through logistic regressive analysis and using LightGBM algorithm separately. The results of the two models were compared for instructing the prevention of stroke. Methods: Samples of residents older than 40 years of age were collected from two medical examination centers in Jiaxing, China from 2018 to 2019. Among the total 2124 subjects, 1059 subjects were middle-aged people (40-59 years old) and 1065 subjects were elder-aged people (≥60 years old). Their demographic characteristics, medical history, family history, eating habits etc. were recorded and separately input into logistic regressive analysis and LightGBM algorithm to build the prediction models of high-risk population of stroke. Four values including F1 score, accuracy, recall rate and AUROC were compared between the two models. Results: The risk factors of stroke were positively correlated with age, while negatively correlated with the frequency of fruit consumption and taste preference. People with low-salt diet were associated with less risk of stroke than those with high-salt diet, and male had higher stroke risk than female. Meanwhile, the risk factors were positively correlated with the frequency of alcohol consumption in the middle-aged group, and negatively correlated with the education level in the elder-aged group. Furthermore, the four values from LightGBM were higher than those from logistic regression, except for the recall value of the middle-aged group. Conclusion: Age, gender, family history of hypertension and diabetes, the frequency of fruit consumption, alcohol and dairy products, taste preference, and education level could as the risk predictive factors of stroke. The Model of using LightGBM algorithm is more accurate than that using logistic regressive analysis.

Keywords