BMC Cancer (Oct 2022)

Using machine learning to identify gene interaction networks associated with breast cancer

  • Liyuan Liu,
  • Wenli Zhai,
  • Fei Wang,
  • Lixiang Yu,
  • Fei Zhou,
  • Yujuan Xiang,
  • Shuya Huang,
  • Chao Zheng,
  • Zhongshang Yuan,
  • Yong He,
  • Zhigang Yu,
  • Jiadong Ji

DOI
https://doi.org/10.1186/s12885-022-10170-w
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Breast cancer (BC) is one of the most prevalent cancers worldwide but its etiology remains unclear. Obesity is recognized as a risk factor for BC, and many obesity-related genes may be involved in its occurrence and development. Research assessing the complex genetic mechanisms of BC should not only consider the effect of a single gene on the disease, but also focus on the interaction between genes. This study sought to construct a gene interaction network to identify potential pathogenic BC genes. Methods The study included 953 BC patients and 963 control individuals. Chi-square analysis was used to assess the correlation between demographic characteristics and BC. The joint density-based non-parametric differential interaction network analysis and classification (JDINAC) was used to build a BC gene interaction network using single nucleotide polymorphisms (SNP). The odds ratio (OR) and 95% confidence interval (95% CI) of hub gene SNPs were evaluated using a logistic regression model. To assess reliability, the hub genes were quantified by edgeR program using BC RNA-seq data from The Cancer Genome Atlas (TCGA) and identical edges were verified by logistic regression using UK Biobank datasets. Go and KEGG enrichment analysis were used to explore the biological functions of interactive genes. Results Body mass index (BMI) and menopause are important risk factors for BC. After adjusting for potential confounding factors, the BC gene interaction network was identified using JDINAC. LEP, LEPR, XRCC6, and RETN were identified as hub genes and both hub genes and edges were verified. LEPR genetic polymorphisms (rs1137101 and rs4655555) were also significantly associated with BC. Enrichment analysis showed that the identified genes were mainly involved in energy regulation and fat-related signaling pathways. Conclusion We explored the interaction network of genes derived from SNP data in BC progression. Gene interaction networks provide new insight into the underlying mechanisms of BC.

Keywords