BMC Bioinformatics (Oct 2021)

Wavelet Screening: a novel approach to analyzing GWAS data

  • William R. P. Denault,
  • Håkon K. Gjessing,
  • Julius Juodakis,
  • Bo Jacobsson,
  • Astanand Jugessur

DOI
https://doi.org/10.1186/s12859-021-04356-5
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Background Traditional methods for single-variant genome-wide association study (GWAS) incur a substantial multiple-testing burden because of the need to test for associations with a vast number of single-nucleotide polymorphisms (SNPs) simultaneously. Further, by ignoring more complex joint effects of nearby SNPs within a given region, these methods fail to consider the genomic context of an association with the outcome. Results To address these shortcomings, we present a more powerful method for GWAS, coined ‘Wavelet Screening’ (WS), that greatly reduces the number of tests to be performed. This is achieved through the use of a sliding-window approach based on wavelets to sequentially screen the entire genome for associations. Wavelets are oscillatory functions that are useful for analyzing the local frequency and time behavior of signals. The signals can then be divided into different scale components and analyzed separately. In the current setting, we consider a sequence of SNPs as a genetic signal, and for each screened region, we transform the genetic signal into the wavelet space. The null and alternative hypotheses are modeled using the posterior distribution of the wavelet coefficients. WS is enhanced by using additional information from the regression coefficients and by taking advantage of the pyramidal structure of wavelets. When faced with more complex genetic signals than single-SNP associations, we show via simulations that WS provides a substantial gain in power compared to both the traditional GWAS modeling and another popular regional association test called SNP-set (Sequence) Kernel Association Test (SKAT). To demonstrate feasibility, we applied WS to a large Norwegian cohort (N=8006) with genotypes and information available on gestational duration. Conclusions WS is a powerful and versatile approach to analyzing whole-genome data and lends itself easily to investigating various omics data types. Given its broader focus on the genomic context of an association, WS may provide additional insight into trait etiology by revealing genes and loci that might have been missed by previous efforts.

Keywords