Scientific Reports (May 2022)

Identification of gene signatures for COAD using feature selection and Bayesian network approaches

  • Yangyang Wang,
  • Xiaoguang Gao,
  • Xinxin Ru,
  • Pengzhan Sun,
  • Jihan Wang

DOI
https://doi.org/10.1038/s41598-022-12780-7
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The combination of TCGA and GTEx databases will provide more comprehensive information for characterizing the human genome in health and disease, especially for underlying the cancer genetic alterations. Here we analyzed the gene expression profile of COAD in both tumor samples from TCGA and normal colon tissues from GTEx. Using the SNR-PPFS feature selection algorithms, we discovered a 38 gene signatures that performed well in distinguishing COAD tumors from normal samples. Bayesian network of the 38 genes revealed that DEGs with similar expression patterns or functions interacted more closely. We identified 14 up-DEGs that were significantly correlated with tumor stages. Cox regression analysis demonstrated that tumor stage, STMN4 and FAM135B dysregulation were independent prognostic factors for COAD survival outcomes. Overall, this study indicates that using feature selection approaches to select key gene signatures from high-dimensional datasets can be an effective way for studying cancer genomic characteristics.