Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023

Sorif Hossain; Mohammad Kamrul Hasan; Mohammad Omar Faruk; Nelufa Aktar; Riyadh Hossain; Kabir Hossain

doi:10.1186/s12872-024-03883-2

BMC Cardiovascular Disorders (Apr 2024)

Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023

Sorif Hossain,
Mohammad Kamrul Hasan,
Mohammad Omar Faruk,
Nelufa Aktar,
Riyadh Hossain,
Kabir Hossain

Affiliations

Sorif Hossain: Department of Statistics, Noakhali Science and Technology University
Mohammad Kamrul Hasan: Department of Information and Communication Engineering, Noakhali Science and Technology University
Mohammad Omar Faruk: Department of Statistics, Noakhali Science and Technology University
Nelufa Aktar: Department of Statistics, Noakhali Science and Technology University
Riyadh Hossain: Department of Statistics, Noakhali Science and Technology University
Kabir Hossain: Department of Statistics, Noakhali Science and Technology University

DOI: https://doi.org/10.1186/s12872-024-03883-2
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 28

Abstract

Read online

Abstract Background Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. Materials and methods The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. Results Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). Conclusion This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient’s CVD prognosis.

Published in BMC Cardiovascular Disorders

ISSN: 1471-2261 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the circulatory (Cardiovascular) system
Website: https://bmccardiovascdisord.biomedcentral.com

About the journal

Abstract

Keywords