A Hybrid Feature Selection and Ensemble Stacked Learning Models on Multi-Variant CVD Datasets for Effective Classification

Abhigya Mahajan; Baijnath Kaushik; Mohammad Khalid Imam Rahmani; Abdulbasid S. Banga

doi:10.1109/ACCESS.2024.3412077

IEEE Access (Jan 2024)

A Hybrid Feature Selection and Ensemble Stacked Learning Models on Multi-Variant CVD Datasets for Effective Classification

Abhigya Mahajan,
Baijnath Kaushik,
Mohammad Khalid Imam Rahmani,
Abdulbasid S. Banga

Affiliations

Abhigya Mahajan: ORCiD; School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
Baijnath Kaushik: ORCiD; School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
Mohammad Khalid Imam Rahmani: ORCiD; College of Computing and Informatics, Saudi Electronic University, Riyadh, Saudi Arabia
Abdulbasid S. Banga: College of Computing and Informatics, Saudi Electronic University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3412077
Journal volume & issue: Vol. 12
pp. 87023 – 87038

Abstract

Read online

Predicting cardiac or heart disease has emerged as a formidable challenge in the medical domain recently. It is recognized as a major global health concern, and stands as one of the primary causes of mortality, posing a significant threat to human life. Early detection of heart disease helps to reduce mortality. This study has experimented with three benchmark datasets such as UCI Heart Disease, Framingham, and Z-Alizadeh Saini containing important clinical information for cardiac vascular disease (CVD). These three datasets’ multi-variant (categorical and continuous) features, variable dimensions, and multicollinearity characteristics provide substantial challenges for machine learning (ML) and other models aiming to achieve the desired results. This study proposes a statistical feature selection (SFS) stacking framework using four feature engineering techniques, Chi-Square, Gini Index, Information Gain, and ANOVA F-test, to select the optimal features from the datasets. Further, the likelihood of developing CVD based on characteristics extracted from the three benchmark datasets using a reduced set of optimized features from the initial feature set is fed to ensemble stacked learning models: stacking using Support Vector Machine (SFS-SVM) and stacking using Cross-Validation Classifier (SFS-SCVC). The SFS-SCVC model has achieved significant performance metrics and outperformed the SFS-SVM and traditional ML models on all three datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords