Frontiers in Endocrinology (Feb 2024)

Predicting the risk of subclinical atherosclerosis based on interpretable machine models in a Chinese T2DM population

  • Ximisinuer Tusongtuoheti,
  • Ximisinuer Tusongtuoheti,
  • Yimeng Shu,
  • Yimeng Shu,
  • Guoqing Huang,
  • Guoqing Huang,
  • Yushan Mao

DOI
https://doi.org/10.3389/fendo.2024.1332982
Journal volume & issue
Vol. 15

Abstract

Read online

BackgroundCardiovascular disease (CVD) has emerged as a global public health concern. Identifying and preventing subclinical atherosclerosis (SCAS), an early indicator of CVD, is critical for improving cardiovascular outcomes. This study aimed to construct interpretable machine learning models for predicting SCAS risk in type 2 diabetes mellitus (T2DM) patients.MethodsThis study included 3084 T2DM individuals who received health care at Zhenhai Lianhua Hospital, Ningbo, China, from January 2018 to December 2022. The least absolute shrinkage and selection operator combined with random forest-recursive feature elimination were used to screen for characteristic variables. Linear discriminant analysis, logistic regression, Naive Bayes, random forest, support vector machine, and extreme gradient boosting were employed in constructing risk prediction models for SCAS in T2DM patients. The area under the receiver operating characteristic curve (AUC) was employed to assess the predictive capacity of the model through 10-fold cross-validation. Additionally, the SHapley Additive exPlanations were utilized to interpret the best-performing model.ResultsThe percentage of SCAS was 38.46% (n=1186) in the study population. Fourteen variables, including age, white blood cell count, and basophil count, were identified as independent risk factors for SCAS. Nine predictors, including age, albumin, and total protein, were screened for the construction of risk prediction models. After validation, the random forest model exhibited the best clinical predictive value in the training set with an AUC of 0.729 (95% CI: 0.709-0.749), and it also demonstrated good predictive value in the internal validation set [AUC: 0.715 (95% CI: 0.688-0.742)]. The model interpretation revealed that age, albumin, total protein, total cholesterol, and serum creatinine were the top five variables contributing to the prediction model.ConclusionThe construction of SCAS risk models based on the Chinese T2DM population contributes to its early prevention and intervention, which would reduce the incidence of adverse cardiovascular prognostic events.

Keywords