PLoS ONE (Jan 2021)

Assessing reproducibility and utility of clustering of patients with type 2 diabetes and established CV disease (SAVOR -TIMI 53 trial).

  • Yasunori Aoki,
  • Bengt Hamrén,
  • Lindsay E Clegg,
  • Christina Stahre,
  • Deepak L Bhatt,
  • Itamar Raz,
  • Benjamin M Scirica,
  • Jan Oscarsson,
  • Björn Carlsson

DOI
https://doi.org/10.1371/journal.pone.0259372
Journal volume & issue
Vol. 16, no. 11
p. e0259372

Abstract

Read online

ObjectiveTo assess the reproducibility and clinical utility of clustering-based subtyping of patients with type 2 diabetes (T2D) and established cardiovascular (CV) disease.MethodsThe cardiovascular outcome trial SAVOR-TIMI 53 (n = 16,492) was used. Analyses focused on T2D patients with established CV disease. Unsupervised machine learning technique called "k-means clustering" was used to divide patients into subtypes. K-means clustering including HbA1c, age of diagnosis, BMI, HOMA2-IR and HOMA2-B was used to assign clusters to the following diabetes subtypes: severe insulin deficient diabetes (SIDD); severe insulin-resistant diabetes (SIRD); mild obesity-related diabetes (MOD); mild age-related diabetes (MARD). We refer these subtypes as "clustering-based diabetes subtypes". A simulation study using randomly generated data was conducted to understand how correlations between the above variables influence the formation of the cluster-based diabetes subtypes. The predictive utility of clustering-based diabetes subtypes for CV events (3-point MACE), renal function reduction (eGFR decrease >30%) and diabetic disease progression (introduction of additional anti-diabetic medication) were compared with conventional risk scores. Hazard ratios (HR) were estimated by Cox-proportional hazard models.ResultsIn the SAVOR-TIMI 53 trial based dataset, the percentage of the clustering-based T2D subtypes were; SIDD (18%), SIRD (17%), MOD (29%), MARD (37%). Using the simulated dataset, the diabetes subtypes could be largely reproduced from a log-normal distribution when including known correlations between variables. The predictive utility of clustering-based diabetic subtypes on CV events, renal function reduction, and diabetic disease progression did not show an advantage compared to conventional risk scores.ConclusionsThe consistent reproduction of four clustering-based T2D subtypes can be explained by the correlations between the variables used for clustering. Subtypes of T2D based on clustering had limited advantage compared to conventional risk scores to predict clinical outcome in patients with T2D and established CV disease.