PLoS ONE (Jan 2020)

A rank-based normalization method with the fully adjusted full-stage procedure in genetic association studies.

  • Li-Chu Chien

DOI
https://doi.org/10.1371/journal.pone.0233847
Journal volume & issue
Vol. 15, no. 6
p. e0233847

Abstract

Read online

In the area of genetic epidemiology, studies of the genotype-phenotype associations have made significant contributions to human complicated trait genetics. These studies depend on specialized statistical methods for uncover the association between traits and genetic variants, both common and rare variants. Often, in analyzing such studies, potentially confounding factors, such as social and environmental conditions, are required to be involved. Multiple linear regression is the most widely used type of regression analysis when the outcome of interest is quantitative traits. Many statistical tests for identifying genotype-phenotype associations using linear regression rely on the assumption that the traits (or the residuals) of the regression follow a normal distribution. In genomic research, the rank-based inverse normal transformation (INT) is one of the most popular approaches to reach normally distributed traits (or normally distributed residuals). Many researchers believe that applying the INT to the non-normality of the traits (or the non-normality of the residuals) is required for valid inference, because the phenotypic (or residual) outliers and non-normality have the significant influence on both the type I error rate control and statistical power, especially under the situation in rare-variant association testing procedures. Here we propose a test for exploring the association of the rare variant with the quantitative trait by using a fully adjusted full-stage INT. Using simulations we show that the fully adjusted full-stage INT is more appropriate than the existing INT methods, such as the fully adjusted two-stage INT and the INT-based omnibus test, in testing genotype-phenotype associations with rare variants, especially when genotypes are uncorrelated with covariates. The fully adjusted full-stage INT retains the advantages of the fully adjusted two-stage INT and ameliorates the problems of the fully adjusted two-stage INT for analysis of rare variants under non-normality of the trait. We also present theoretical results on these desirable properties. In addition, the two available methods with non-normal traits, the quantile/median regression method and the Yeo-Johnson power transformation, are also included in simulations for comparison with these desirable properties.