Journal of Big Data (Mar 2024)

Integration of transcriptomic analysis and multiple machine learning approaches identifies NAFLD progression-specific hub genes to reveal distinct genomic patterns and actionable targets

  • Jing Sun,
  • Run Shi,
  • Yang Wu,
  • Yan Lou,
  • Lijuan Nie,
  • Chun Zhang,
  • Yutian Cao,
  • Qianhua Yan,
  • Lifang Ye,
  • Shu Zhang,
  • Xuanbin Wang,
  • Qibiao Wu,
  • Xuehua Jiao,
  • Jiangyi Yu,
  • Zhuyuan Fang,
  • Xiqiao Zhou

DOI
https://doi.org/10.1186/s40537-024-00899-5
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Background Nonalcoholic fatty liver disease (NAFLD) is a leading public health problem worldwide. Approximately one fourth of patients with nonalcoholic fatty liver (NAFL) progress to nonalcoholic steatohepatitis (NASH), an advanced stage of NAFLD. Hence, there is an urgent need to make a better understanding of NAFLD heterogeneity and facilitate personalized management of high-risk NAFLD patients who may benefit from more intensive surveillance and preventive intervene. Methods In this study, a series of bioinformatic methods were performed to identify NAFLD progression-specific pathways and genes, and three machine learning approaches were combined to construct a risk-stratification gene signature to quantify risk assessment. In addition, bulk RNA-seq, single-cell RNA-seq (scRNA-seq) transcriptome profiling data and whole-exome sequencing (WES) data were comprehensively analyzed to reveal the genomic alterations and altered pathways between distinct molecular subtypes. Results Two distinct subtypes of NAFL were identified with the NAFLD progression-specific genes, and one subtype has a high similarity of the inflammatory pattern and fibrotic potential with NASH. The established risk-stratification gene signature could discriminate advanced samples from overall NAFLD. COL1A2, one key gene closely related to NAFLD progression, is specifically expressed in fibroblasts involved in hepatocellular carcinoma (HCC), and significantly correlated with EMT and angiogenesis in pan-cancer. Moreover, the β-catenin/COL1A2 axis might play a critical role in fibrosis severity and inflammatory response during NAFLD-HCC progression. Conclusion In summary, our study provided evidence for the necessity of molecular classification and established a risk-stratification gene signature to quantify risk assessment of NAFLD, aiming to identify different risk subsets and to guide personalized treatment.

Keywords