Zhongguo cuzhong zazhi (Jun 2024)

心脑血管疾病与气象因素关系预测模型的建立与评估 Establishment and Evaluation of the Prediction Models of the Relationship between Cardiovascular and Cerebrovascular Diseases and Meteorological Factors

  • 尚媛媛1,杜正静2,陈静怡3,彭波2,龙杰琦1 (SHANG Yuanyuan1, DU Zhengjing2, CHEN Jingyi3, PENG Bo2, LONG Jieqi1 )

DOI
https://doi.org/10.3969/j.issn.1673-5765.2024.06.004
Journal volume & issue
Vol. 19, no. 6
pp. 632 – 639

Abstract

Read online

目的 探讨心脑血管疾病的发病状况和气象因素之间的关系,运用机器学习方法预测心脑血管疾病发病风险等级,为疾病防控提供科学依据。 方法 以贵州省疾病预防控制中心提供的心脑血管疾病患者为研究对象,通过相关性分析确定模型的预测因子,分别基于支持向量机、极端梯度提升、轻量级梯度提升机、随机森林这4种机器学习模型构建心脑血管疾病发病风险的预测模型。将纳入患者以8∶2的比例分为训练集和测试集。训练集用于模型训练和参数优化,测试集用于评价模型效果。主要以准确率来评价各模型的预测效果。 结果 本研究共纳入60岁以上心脑血管疾病患者16 383例,其中女性6507例,且日发病例数表现为不平衡数据,其中诊断类型包括急性心肌梗死、卒中、心绞痛、心源性猝死。日发病例数与气压、气温、湿度3大类26种气象因素存在相关性,与气压、相对湿度呈正相关,与气温呈负相关。采用GridSearchCV函数找出最优权重的配比后,使用机器学习方法构建模型,并通过测试集验证输出模型指标参数。轻量级梯度提升机模型在预测任务中表现最佳,准确率达到85.68%,精确率为82.56%,召回率为85.68%,F1分数为79.56%(均P<0.05)。心脑血管疾病患者发病前72 h气温的INP值为63 814,是影响日发病例数最重要的气象因素,排名第2和第3的是发病前48 h气温和发病前24 h气温,对应INP值分别为62 002、43 216。 结论 基于机器学习方法建立的心脑血管疾病发病预测模型具有较高的预测价值,其中轻量级梯度提升机模型的预测效果最好。 Abstract: Objective To explore the relationship between the incidence of cardiovascular and cerebrovascular diseases and meteorological factors, and to predict the incidence risk levels of cardiovascular and cerebrovascular diseases using machine learning methods, with the aim of providing the scientific basis for disease prevention and control. Methods Patients with cardiovascular and cerebrovascular diseases, whose information were provided by the Guizhou Center for Disease Control and Prevention, were selected as subjects. The predictive factors of the model were determined through correlation analysis, and the prediction models for the risk of cardiovascular and cerebrovascular diseases were constructed based on four machine learning models: support vector machine, extreme gradient boosting, light gradient boosting machine, and random forest. The included patients were divided into the training set and the testing set in the ratio of 8∶2. The training set was used for model training and parameter optimization, and the testing set was used to evaluate the effect of the model. The predictive performance of each model was mainly evaluated by accuracy. Results A total of 16 383 patients over 60 years of age with cardiovascular and cerebrovascular diseases were included in this study, including 6507 women. The number of daily cases was unbalanced, in which the diagnostic types included acute myocardial infarction, stroke, angina pectoris, and sudden cardiac death. The number of daily cases was correlated with 26 meteorological factors in 3 categories including air pressure, air temperature, and humidity, and was positively correlated with air pressure and relative humidity, but negatively correlated with air temperature. The GridSearchCV function was used to find the optimal weight ratio, the machine learning method was used to construct the model, and the output model index parameters were verified through the testing set. The light gradient boosting machine model performed best in the prediction task, with an accuracy of 85.68%, a precision of 82.56%, a recall of 85.68%, and the F1 score was 79.56% (all P<0.05). The INP value of the temperature of 72 h before the onset of cardiovascular and cerebrovascular diseases was 63 814, which was the most important meteorological factor affecting the number of daily cases. The temperatures of 48 h before the onset and 24 h before the onset respectively ranked second and third, corresponding to INP values of 62 002 and 43 216. Conclusions The prediction models of cardiovascular and cerebrovascular diseases based on machine learning methods have high predictive value. Among them, the light gradient boosting machine model presented the best performance.

Keywords