Informatics in Medicine Unlocked (Jan 2023)

Enhanced cardiovascular disease prediction model using random forest algorithm

  • Kellen Sumwiza,
  • Celestin Twizere,
  • Gerard Rushingabigwi,
  • Pierre Bakunzibake,
  • Peace Bamurigire

Journal volume & issue
Vol. 41
p. 101316

Abstract

Read online

Cardiovascular diseases (CVDs) such as hypertension, heart failure, stroke, and coronary artery disease are now the major causes of early death worldwide, particularly in low and middle-income countries. Early detection of these disorders could lower the number of people who die prematurely. Researchers have proposed many techniques for CVD prediction, such as data mining, machine learning (ML), and the Internet of Things (IoT), for the early detection and monitoring of cardiac patients. Although these techniques are suggested and sometimes used, there is still much worry regarding their efficacy in situations where the error rate is high and accuracy is doubtful. As a result, it is necessary to select a prediction technique that can deliver more accuracy and fewer errors. This paper proposes an effective ensemble method based on the Random Forest (RF) algorithm for improving accuracy by combining multiple feature selection techniques. A classification model is built using training datasets and produces several decision trees. These datasets adjust for any missing data and include estimates of important classification variables. Data preparation and feature selection approaches such as correlation coefficients and data mining feature selection techniques are applied to remove outliers. Finally, a cardiovascular disease prediction model is created, and the model's increased performance accuracy is tested using a confusion matrix. A dataset from the Kaggle repository with 1025 data points and 14 attributes is used. The saved data are then preprocessed, yielding a dataset of 769 records and 13 variables that are evaluated using RF and compared to various models such as K-Nearest Neighbor (K-NN), Support Vector Machine (SVM), and Logistic Regression (LR) for the prediction of heart disorders. After employing several techniques, the proposed model achieves an accuracy of 99%, which shows a significant improvement over the other assessed models.

Keywords