Frontiers in Genetics (Feb 2023)

Integrated genomic analysis defines molecular subgroups in dilated cardiomyopathy and identifies novel biomarkers based on machine learning methods

  • Ling-Fang Ye,
  • Jia-Yi Weng,
  • Li-Da Wu

DOI
https://doi.org/10.3389/fgene.2023.1050696
Journal volume & issue
Vol. 14

Abstract

Read online

Aim: As the most common cardiomyopathy, dilated cardiomyopathy (DCM) often leads to progressive heart failure and sudden cardiac death. This study was designed to investigate the molecular subgroups of DCM.Methods: Three datasets of DCM were downloaded from GEO database (GSE17800, GSE79962 and GSE3585). After log2-transformation and background correction with “limma” package in R software, the three datasets were merged into a metadata cohort. The consensus clustering was conducted by the “Consensus Cluster Plus” package to uncover the molecular subgroups of DCM. Moreover, clinical characteristics of different molecular subgroups were compared in detail. We also adopted Weighted gene co-expression network analysis (WGCNA) analysis based on subgroup‐specific signatures of gene expression profiles to further explore the specific gene modules of each molecular subgroup and its biological function. Two machine learning methods of LASSO regression algorithm and SVM-RFE algorithm was used to screen out the genetic biomarkers, of which the discriminative ability of molecular subgroups was evaluated by receiver operating characteristic (ROC) curve.Results: Based on the gene expression profiles, heart tissue samples from patients with DCM were clustered into three molecular subgroups. No statistical difference was found in age, body mass index (BMI) and left ventricular internal diameter at end-diastole (LVIDD) among three molecular subgroups. However, the results of left ventricular ejection fraction (LVEF) statistics showed that patients from subgroup 2 had a worse condition than the other group. We found that some of the gene modules (pink, black and grey) in WGCNA analysis were significantly related to cardiac function, and each molecular subgroup had its specific gene modules functions in modulating occurrence and progression of DCM. LASSO regression algorithm and SVM-RFE algorithm was used to further screen out genetic biomarkers of molecular subgroup 2, including TCEAL4, ISG15, RWDD1, ALG5, MRPL20, JTB and LITAF. The results of ROC curves showed that all of the genetic biomarkers had favorable discriminative effectiveness.Conclusion: Patients from different molecular subgroups have their unique gene expression patterns and different clinical characteristics. More personalized treatment under the guidance of gene expression patterns should be realized.

Keywords