The challenge of detecting genotype-by-methylation interaction: GAW20
Mariza de Andrade,
E. Warwick Daw,
Aldi T. Kraja,
Virginia Fisher,
Lan Wang,
Ke Hu,
Jing Li,
Razvan Romanescu,
Jenna Veenstra,
Rui Sun,
Haoyi Weng,
Wenda Zhou
Affiliations
Mariza de Andrade
Division of Biomedical Statistics and Informatics, Department of Health Sciences Research
E. Warwick Daw
Division of Statistical Genomics, Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine
Aldi T. Kraja
Division of Statistical Genomics, Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine
Virginia Fisher
Department of Biostatistics, Boston University School of Public Health, Boston
Lan Wang
Department of Biostatistics, Boston University School of Public Health, Boston
Ke Hu
Department of Electrical Engineering and Computer Science, Case Western Reserve University
Jing Li
Department of Electrical Engineering and Computer Science, Case Western Reserve University
Razvan Romanescu
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, University of Toronto
Jenna Veenstra
Department of Biology, Dordt College
Rui Sun
Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong
Haoyi Weng
Division of Biostatistics, Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, the Chinese University of Hong Kong
Abstract Background GAW20 working group 5 brought together researchers who contributed 7 papers with the aim of evaluating methods to detect genetic by epigenetic interactions. GAW20 distributed real data from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study, including single-nucleotide polymorphism (SNP) markers, methylation (cytosine-phosphate-guanine [CpG]) markers, and phenotype information on up to 995 individuals. In addition, a simulated data set based on the real data was provided. Results The 7 contributed papers analyzed these data sets with a number of different statistical methods, including generalized linear mixed models, mediation analysis, machine learning, W-test, and sparsity-inducing regularized regression. These methods generally appeared to perform well. Several papers confirmed a number of causative SNPs in either the large number of simulation sets or the real data on chromosome 11. Findings were also reported for different SNPs, CpG sites, and SNP–CpG site interaction pairs. Conclusions In the simulation (200 replications), power appeared generally good for large interaction effects, but smaller effects will require larger studies or consortium collaboration for realizing a sufficient power.