Scientific Reports (Jan 2022)

Application of ensemble machine learning algorithms on lifestyle factors and wearables for cardiovascular risk prediction

  • Weiting Huang,
  • Tan Wei Ying,
  • Woon Loong Calvin Chin,
  • Lohendran Baskaran,
  • Ong Eng Hock Marcus,
  • Khung Keong Yeo,
  • Ng See Kiong

DOI
https://doi.org/10.1038/s41598-021-04649-y
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 12

Abstract

Read online

Abstract This study looked at novel data sources for cardiovascular risk prediction including detailed lifestyle questionnaire and continuous blood pressure monitoring, using ensemble machine learning algorithms (MLAs). The reference conventional risk score compared against was the Framingham Risk Score (FRS). The outcome variables were low or high risk based on calcium score 0 or calcium score 100 and above. Ensemble MLAs were built based on naive bayes, random forest and support vector classifier for low risk and generalized linear regression, support vector regressor and stochastic gradient descent regressor for high risk categories. MLAs were trained on 600 Southeast Asians aged 21 to 69 years free of cardiovascular disease. All MLAs outperformed the FRS for low and high-risk categories. MLA based on lifestyle questionnaire only achieved AUC of 0.715 (95% CI 0.681, 0.750) and 0.710 (95% CI 0.653, 0.766) for low and high risk respectively. Combining all groups of risk factors (lifestyle survey questionnaires, clinical blood tests, 24-h ambulatory blood pressure and heart rate monitoring) along with feature selection, prediction of low and high CVD risk groups were further enhanced to 0.791 (95% CI 0.759, 0.822) and 0.790 (95% CI 0.745, 0.836). Besides conventional predictors, self-reported physical activity, average daily heart rate, awake blood pressure variability and percentage time in diastolic hypertension were important contributors to CVD risk classification.