IEEE Access (Jan 2024)
Predicting Heart Diseases Using Machine Learning and Different Data Classification Techniques
Abstract
Heart disease (HD), including heart attacks, is a primary cause of death across the world. In the area of medical data analysis, one of the most difficult problems to solve is determining the probability of a patient having heart disease. Death rates can be lowered by the early detection of heart diseases and the constant monitoring of patients by physicians. Unfortunately, heart disease cannot always be detected accurately, and a doctor cannot be in touch with a patient 24/7. Machine learning (ML) has the potential to aid in diagnostics by providing a more precise basis for prediction and making decisions using data given by healthcare sectors throughout the world. This study aims to employ several feature selection methods to develop an accurate ML technique for heart disease prediction in its earliest stages. The feature selection process was performed using three distinct methods, namely, chi-square, analysis of variance (ANOVA), and mutual information (MI). The three feature groups that were ultimately selected were referred to as SF-1, SF-2, and SF-3, respectively. Then, ten different ML classifiers were used to determine the best technique, and which feature subset was the greatest fit. These classifiers included Naive Bayes, support vector machine (SVM), voting, XGBoost, AdaBoost, bagging, decision tree (DT), K-nearest neighbor (KNN), random forest (RF), and logistic regression (LR), and they were denoted as (A1, A2, …, A10). The proposed approach for predicting heart diseases was evaluated using a private dataset, a publicly available dataset, and multiple cross-validation methods. To find the classifier that generates the best rate of accurate heart disease predictions, we applied the Synthetic Minority Oversampling Technique (SMOTE) to fix the issue of unbalanced data. The experimental findings demonstrated that the XGBoost classifier achieved the optimal performance using the combined datasets and SF-2 feature subset with the following rates: 97.57% for accuracy, 96.61% for sensitivity, 90.48% for specificity, 95.00% for precision, 92.68% for F1 score, and 98% for AUC. The development of an explainable artificial intelligence approach that makes use of SHAP methodologies is being done to get an understanding of how the system predicts its ultimate results. The proposed technique had great promise for the healthcare sector to predict early-stage heart disease with cheap cost and minimal time. Ultimately, the best ML method has been used to make a mobile app that lets users enter HD symptoms and quickly receive a heart disease prediction.
Keywords