PLoS ONE (Jan 2020)
A consistent approach to the genotype encoding problem in a genome-wide association study of continuous phenotypes.
Abstract
In this study, we suggested a hypothesis test method that was robust to different genotype encodings in a genome-wide association analysis of continuous traits. When the population stratification is corrected for using a method based on principal component analysis, ordinally (or categorically) encoded genotypes are adjusted and turn into continuous values. Due to the adjustment of the encoded genotype, the association test result using conventional methods, such as the test of Pearson's correlation coefficient, was shown to be dependent on how genotypes were encoded. To overcome this shortcoming, we proposed a non-parametric test based on Kendall's tau. Because Kendall's tau deals with rank, rather than value, associations between adjusted genotype and phenotype values, Kendall's test can be more robust than Pearson's test under different genotype encodings. We assessed the robustness of Kendall's test and compared with that of Pearson's test in terms of the difference in p-values obtained by using different genotype encodings. With simulated as well as real data set, we demonstrated that Kendall's test was more robust than Pearson's test under different genotype encodings. The proposed method can be applicable to the broad topics of interest in population genetics and comparative genomics, in which novel genetic variants are associated with traits. This study may also bring about a cautious approach to the genotype encoding in the numerical analysis.