Mathematical Biosciences and Engineering (Jan 2020)
Examining the rare disease assumption used to justify HWE testing with control samples
Abstract
Many statistical methods for analyzing genetic data, such as those used in genome-wide association studies, assume Hardy-Weinberg Equilibrium (HWE). Therefore, to use such methods, one must check whether the HWE assumption is valid. For a case-control study, researchers have recognized that Hardy Weinberg proportions will be distorted if the marker being tested happens to be associated with the disease. To alleviate this problem, many studies carry out HWE testing on controls only. A number of papers in the literature have justified this practice by making the rare disease assumption without providing rigorous theoretical basis for this justification. Even though many of the diseases studied today are common, whether it is justifiable to use controls to test for HWE when the disease is indeed rare remains an outstanding issue. In this study, we address the rare disease assumption as well as potential problems associated with testing for HWE using controls only, regardless of the prevalence of the disease. We carried out theoretical derivations and numerical studies; the latter were performed using simulated genotypes as well as data from the 1000 Genomes Project. The results from our study are striking: the type Ⅰ error can be severely inflated, regardless of whether the disease being investigated is rare or common. This study shows that, based on the common practice of using controls only to test for HWE, many genetic variants will be discarded erroneously, wasting valuable information and hindering the ability to detect disease-associated variants.
Keywords