IEEE Access (Jan 2024)
An Integrated Two-Layered Voting (TLV) Framework for Coronary Artery Disease Prediction Using Machine Learning Classifiers
Abstract
Cardiovascular problems have emerged as a significant concern, adversely impacting individuals across all age groups. Several recent research studies have used Machine learning (ML) techniques to design decision-making systems for the tremendous data in the medical sector. Although these works obtained promising results, most of the studies focused on small datasets. Since the size of the dataset affects algorithm performance, this study used two datasets, such as Kaggle’s heart disease dataset of over 70,000 records and UCI’s heart disease dataset of 1025 records. In addition to the old features the Pulse Pressure (PP), the Body Mass Index (BMI), and the Mean Arterial Pressure (MAP), three new features are introduced to improve the results. This paper proposes the TLV (Two-Layer Voting) model, which is an ensemble method of hard and soft voting. As part of layer 1, features are shortlisted by soft and hard voting using three statistical methods, including the ANOVA f-test, Chi-squared test, and Mutual Information. In layer 2, soft voting and hard voting performance are compared, which incorporates Multi-Layer Perceptron, Decision Tree, Support Vector Classifier, and Random Forest algorithms. Classification algorithms are hyper-tuned using the GridSearchCV method in the second layer. Using UCI’s heart disease dataset and Kaggle’s CVD dataset, the proposed TLV methodology with soft voting provided the highest accuracy of 99.03% and 88.09%, respectively. The proposed model significantly outperforms existing CAD disease prediction studies.
Keywords