Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

Pritam Chakraborty; Anjan Bandyopadhyay; Preeti Padma Sahu; Aniket Burman; Saurav Mallik; Najah Alsubaie; Mohamed Abbas; Mohammed S. Alqahtani; Ben Othman Soufiene

doi:10.1186/s12859-024-05866-8

BMC Bioinformatics (Oct 2024)

Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

Pritam Chakraborty,
Anjan Bandyopadhyay,
Preeti Padma Sahu,
Aniket Burman,
Saurav Mallik,
Najah Alsubaie,
Mohamed Abbas,
Mohammed S. Alqahtani,
Ben Othman Soufiene

Affiliations

Pritam Chakraborty: School of computer engineering, KIIT University
Anjan Bandyopadhyay: School of computer engineering, KIIT University
Preeti Padma Sahu: School of computer engineering, KIIT University
Aniket Burman: School of computer engineering, KIIT University
Saurav Mallik: Department of Environmental Health, Harvard T H Chan School of public Health
Najah Alsubaie: Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University
Mohamed Abbas: Electrical Engineering Department, College of Engineering, King Khalid University
Mohammed S. Alqahtani: Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University
Ben Othman Soufiene: PRINCE Laboratory Research, ISITcom, Hammam Sousse, University of Sousse

DOI: https://doi.org/10.1186/s12859-024-05866-8
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 23

Abstract

Read online

Abstract Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. We systematically varied PCA components and implemented a stacking model comprising random forest, decision tree, and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA components to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6% accuracy in stroke prediction. Evaluation metrics underscored the robustness of our approach in handling class imbalance and improving model performance, also comparative analyses against traditional machine learning algorithms such as SVM, logistic regression, and Naive Bayes highlighted the superiority of our proposed method.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords