IEEE Access (Jan 2024)
A Hybrid Feature Selection and Ensemble Stacked Learning Models on Multi-Variant CVD Datasets for Effective Classification
Abstract
Predicting cardiac or heart disease has emerged as a formidable challenge in the medical domain recently. It is recognized as a major global health concern, and stands as one of the primary causes of mortality, posing a significant threat to human life. Early detection of heart disease helps to reduce mortality. This study has experimented with three benchmark datasets such as UCI Heart Disease, Framingham, and Z-Alizadeh Saini containing important clinical information for cardiac vascular disease (CVD). These three datasets’ multi-variant (categorical and continuous) features, variable dimensions, and multicollinearity characteristics provide substantial challenges for machine learning (ML) and other models aiming to achieve the desired results. This study proposes a statistical feature selection (SFS) stacking framework using four feature engineering techniques, Chi-Square, Gini Index, Information Gain, and ANOVA F-test, to select the optimal features from the datasets. Further, the likelihood of developing CVD based on characteristics extracted from the three benchmark datasets using a reduced set of optimized features from the initial feature set is fed to ensemble stacked learning models: stacking using Support Vector Machine (SFS-SVM) and stacking using Cross-Validation Classifier (SFS-SCVC). The SFS-SCVC model has achieved significant performance metrics and outperformed the SFS-SVM and traditional ML models on all three datasets.
Keywords