IEEE Access (Jan 2024)

A Hybrid Feature Selection and Ensemble Stacked Learning Models on Multi-Variant CVD Datasets for Effective Classification

  • Abhigya Mahajan,
  • Baijnath Kaushik,
  • Mohammad Khalid Imam Rahmani,
  • Abdulbasid S. Banga

DOI
https://doi.org/10.1109/ACCESS.2024.3412077
Journal volume & issue
Vol. 12
pp. 87023 – 87038

Abstract

Read online

Predicting cardiac or heart disease has emerged as a formidable challenge in the medical domain recently. It is recognized as a major global health concern, and stands as one of the primary causes of mortality, posing a significant threat to human life. Early detection of heart disease helps to reduce mortality. This study has experimented with three benchmark datasets such as UCI Heart Disease, Framingham, and Z-Alizadeh Saini containing important clinical information for cardiac vascular disease (CVD). These three datasets’ multi-variant (categorical and continuous) features, variable dimensions, and multicollinearity characteristics provide substantial challenges for machine learning (ML) and other models aiming to achieve the desired results. This study proposes a statistical feature selection (SFS) stacking framework using four feature engineering techniques, Chi-Square, Gini Index, Information Gain, and ANOVA F-test, to select the optimal features from the datasets. Further, the likelihood of developing CVD based on characteristics extracted from the three benchmark datasets using a reduced set of optimized features from the initial feature set is fed to ensemble stacked learning models: stacking using Support Vector Machine (SFS-SVM) and stacking using Cross-Validation Classifier (SFS-SCVC). The SFS-SCVC model has achieved significant performance metrics and outperformed the SFS-SVM and traditional ML models on all three datasets.

Keywords