BMC Bioinformatics (Jan 2009)

MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study

  • Xue Hong,
  • Tang Nelson LS,
  • Yang Qiang,
  • Yang Can,
  • Wan Xiang,
  • Yu Weichuan

DOI
https://doi.org/10.1186/1471-2105-10-13
Journal volume & issue
Vol. 10, no. 1
p. 13

Abstract

Read online

Abstract Background The interactions of multiple single nucleotide polymorphisms (SNPs) are highly hypothesized to affect an individual's susceptibility to complex diseases. Although many works have been done to identify and quantify the importance of multi-SNP interactions, few of them could handle the genome wide data due to the combinatorial explosive search space and the difficulty to statistically evaluate the high-order interactions given limited samples. Results Three comparative experiments are designed to evaluate the performance of MegaSNPHunter. The first experiment uses synthetic data generated on the basis of epistasis models. The second one uses a genome wide study on Parkinson disease (data acquired by using Illumina HumanHap300 SNP chips). The third one chooses the rheumatoid arthritis study from Wellcome Trust Case Control Consortium (WTCCC) using Affymetrix GeneChip 500K Mapping Array Set. MegaSNPHunter outperforms the best solution in this area and reports many potential interactions for the two real studies. Conclusion The experimental results on both synthetic data and two real data sets demonstrate that our proposed approach outperforms the best solution that is currently available in handling large-scale SNP data both in terms of speed and in terms of detection of potential interactions that were not identified before. To our knowledge, MegaSNPHunter is the first approach that is capable of identifying the disease-associated SNP interactions from WTCCC studies and is promising for practical disease prognosis.