Digital Health (Oct 2024)
Designing machine learning for big data: A study to identify factors that increase the risk of ischemic stroke and prognosis in hypertensive patients
Abstract
Background Ischemic stroke (IS) accounts large amount of stroke incidence. The aim of this study was to discover the risk and prognostic factors that affecting the occurrence of IS in hypertensive patients. Method Study data were obtained from the Medical Information Mart for Intensive Care (MIMIC)-IV database. To avoid biased factors selection process, several approaches were studied including logistic regression, elastic net regression, random forest, correlation analysis, and multifactor logistic regression methods. And seven different machine-learning methods are used to construct predictive models. The performance of the developed models was evaluated using AUC (Area Under the Curve), prediction accuracy, precision, recall, F1 score, PPV (Positive Predictive Value) and NPV (Negative Predictive Value). Interaction analysis was conducted to explore potential relationships between influential factors. Results The study included 92,514 hypertensive patients, of which 1746 hypertensive patients experienced IS. The Gradient Boosted Decision Tree (GBDT) model outperformed the other prediction model terms of prediction accuracy and AUC values in both ischemic and prognosis cases. By using the SHapley Additive exPlanations (SHAP), we found that a range of factors and corresponding interactions between factors are important risk factors for IS and its prognosis in hypertensive patients. Conclusion The study identified factors that increase the risk of IS and poor prognosis in hypertensive patients, which may provide guidance for clinical diagnosis and treatment.