Scientific Reports (Dec 2024)
Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression
Abstract
Abstract High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regression (LR) modeling. This study was conducted on a population of 9704 males and females aged 35 to 65 years recruited from the Mashhad Stroke and Heart Atherosclerotic Disorder (MASHAD) cohort study. We utilized a data mining approach to construct a predictive model of hs-CRP measurements, employing the DT methodology. DT model was used to predict hs-CRP level using biochemical factors and clinical features. A total of 9,704 individuals were included in the analysis, with 57% of them being female. Except for fasting blood glucose (FBG), hypertension (HTN), and Type 2 diabetes mellites (T2DM), all variables showed significant differences between the two groups. The results of the LR models showed that variables such as anxiety score, depression score, Systolic Blood Pressure, Cardiovascular disease, and HTN were significant in predicting hs-CRP levels. In the DT models, depression score, FBG, cholesterol, and anxiety score were identified as the most important factors in predicting hs-CRP levels. DT model was able to predict hs-CRP level with an accuracy of 72.1% in training and 71.4% in testing of both genders. The proposed DT model appears to be able to predict the hs-CRP levels based on anxiety score, depression scores, fasting blood glucose, systolic blood pressure, and history of cardiovascular diseases.
Keywords