Frontiers in Genetics (Jul 2020)

Whole Genome Sequencing of Four Representatives From the Admixed Population of the United Arab Emirates

  • Gihan Daw Elbait,
  • Andreas Henschel,
  • Andreas Henschel,
  • Guan K. Tay,
  • Guan K. Tay,
  • Guan K. Tay,
  • Guan K. Tay,
  • Habiba S. Al Safar,
  • Habiba S. Al Safar,
  • Habiba S. Al Safar

DOI
https://doi.org/10.3389/fgene.2020.00681
Journal volume & issue
Vol. 11

Abstract

Read online

Whole genome sequences (WGS) of four nationals of the United Arab Emirates (UAE) at an average coverage of 33X have been completed and described. The selection of suitable subpopulation representatives was informed by a preceding comprehensive population structure analysis. Representatives were chosen based on their central location within the subpopulation on a principal component analysis (PCA) and the degree to which they were admixed. Novel genomic variations among the different subgroups of the UAE population are reported here. Specifically, the WGS analysis identified 4,161,067–4,798,806 variants in the four individual samples, where approximately 80% were single nucleotide polymorphisms (SNPs) and 20% were insertions or deletions (indels). An average of 2.75% was found to be novel variants according to dbSNP (build 151). This is the first report of structural variants (SV) from WGS data from UAE nationals. There were 15,677–20,339 called SVs, of which around 13.5% were novel. The four samples shared 1,399,178 variants, each with distinct variants as follows: 1,085,524 (for the individual denoted as UAE S011), 1,228,559 (UAE S012), 791,072 (UAE S013), and 906,818 (UAE S014). These results show a previously unappreciated population diversity in the region. The synergy of WGS and genotype array data was demonstrated through variant annotation of the former using 2.3 million allele frequencies for the local population derived from the latter technology platform. This novel approach of combining breadth and depth of array and WGS technologies has guided the choice of population genetic representatives and provides complementary, regionalized allele frequency annotation to new genomes comprising millions of loci.

Keywords