BioData Mining (Jan 2025)

Genome-wide association studies are enriched for interacting genes

  • Peter T. Nguyen,
  • Simon G. Coetzee,
  • Irina Silacheva,
  • Dennis J. Hazelett

DOI
https://doi.org/10.1186/s13040-024-00421-w
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background With recent advances in single cell technology, high-throughput methods provide unique insight into disease mechanisms and more importantly, cell type origin. Here, we used multi-omics data to understand how genetic variants from genome-wide association studies influence development of disease. We show in principle how to use genetic algorithms with normal, matching pairs of single-nucleus RNA- and ATAC-seq, genome annotations, and protein-protein interaction data to describe the genes and cell types collectively and their contribution to increased risk. Results We used genetic algorithms to measure fitness of gene-cell set proposals against a series of objective functions that capture data and annotations. The highest information objective function captured protein-protein interactions. We observed significantly greater fitness scores and subgraph sizes in foreground vs. matching sets of control variants. Furthermore, our model reliably identified known targets and ligand-receptor pairs, consistent with prior studies. Conclusions Our findings suggested that application of genetic algorithms to association studies can generate a coherent cellular model of risk from a set of susceptibility variants. Further, we showed, using breast cancer as an example, that such variants have a greater number of physical interactions than expected due to chance.

Keywords