BMC Bioinformatics (May 2011)

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data

  • Scherer Stephen W,
  • Pinto Dalila,
  • Tango Toshiro,
  • Takahashi Kunihiko,
  • Nishiyama Takeshi,
  • Takami Satoshi,
  • Kishino Hirohisa

DOI
https://doi.org/10.1186/1471-2105-12-205
Journal volume & issue
Vol. 12, no. 1
p. 205

Abstract

Read online

Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.