F1000Research (Nov 2021)

PEGS: An efficient tool for gene set enrichment within defined sets of genomic intervals [version 2; peer review: 2 approved]

  • Peter Briggs,
  • A. Louise Hunter,
  • Shen-hsi Yang,
  • Andrew D. Sharrocks,
  • Mudassar Iqbal

DOI
https://doi.org/10.12688/f1000research.53926.2
Journal volume & issue
Vol. 10

Abstract

Read online

Many biological studies of transcriptional control mechanisms produce lists of genes and non-coding genomic intervals from corresponding gene expression and epigenomic assays. In higher organisms, such as eukaryotes, genes may be regulated by distal elements, with these elements lying 10s–100s of kilobases away from a gene transcription start site. To gain insight into these distal regulatory mechanisms, it is important to determine comparative enrichment of genes of interest in relation to genomic regions of interest, and to be able to do so at a range of distances. Existing bioinformatics tools can annotate genomic regions to nearest known genes, or look for transcription factor binding sites in relation to gene transcription start sites. Here, we present PEGS (Peak set Enrichment in Gene Sets). This tool efficiently provides an exploratory analysis by calculating enrichment of multiple gene sets, associated with multiple non-coding elements (peak sets), at multiple genomic distances, and within topologically associated domains. We apply PEGS to gene sets derived from gene expression studies, and genomic intervals from corresponding ChIP-seq and ATAC-seq experiments to derive biologically meaningful results. We also demonstrate an extended application to tissue-specific gene sets and publicly available GWAS data, to find enrichment of sleep trait associated SNPs in relation to tissue-specific gene expression profiles.