Genes (May 2020)

Association Analysis and Meta-Analysis of Multi-Allelic Variants for Large-Scale Sequence Data

  • Yu Jiang,
  • Sai Chen,
  • Xingyan Wang,
  • Mengzhen Liu,
  • William G. Iacono,
  • John K. Hewitt,
  • John E. Hokanson,
  • Kenneth Krauter,
  • Markku Laakso,
  • Kevin W. Li,
  • Sharon M. Lutz,
  • Matthew McGue,
  • Anita Pandit,
  • Gregory J.M. Zajac,
  • Michael Boehnke,
  • Goncalo R. Abecasis,
  • Scott I. Vrieze,
  • Bibo Jiang,
  • Xiaowei Zhan,
  • Dajiang J. Liu

DOI
https://doi.org/10.3390/genes11050586
Journal volume & issue
Vol. 11, no. 5
p. 586

Abstract

Read online

There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.

Keywords