Scientific Reports (Dec 2021)
Sequence variations, flanking region mutations, and allele frequency at 31 autosomal STRs in the central Indian population by next generation sequencing (NGS)
Abstract
Abstract Capillary electrophoresis-based analysis does not reflect the exact allele number variation at the STR loci due to the non-availability of the data on sequence variation in the repeat region and the SNPs in flanking regions. Herein, this study reports the length-based and sequence-based allelic data of 138 central Indian individuals at 31 autosomal STR loci by NGS. The sequence data at each allele was compared to the reference hg19 sequence. The length-based allelic results were found in concordance with the CE-based results. 20 out of 31 autosomal STR loci showed an increase in the number of alleles by the presence of sequence variation and/or SNPs in the flanking regions. The highest gain in the heterozygosity and allele numbers was observed in D5S2800, D1S1656, D16S539, D5S818, and vWA. rs25768 (A/G) at D5S818 was found to be the most frequent SNP in the studied population. Allele no. 15 of D3S1358, allele no. 19 of D2S1338, and allele no. 22 of D12S391 showed 5 isoalleles each with the same size and with different intervening sequences. Length-based determination of the alleles showed Penta E to be the most useful marker in the central Indian population among 31 STRs studied; however, sequence-based analysis advocated D2S1338 to be the most useful marker in terms of various forensic parameters. Population genetics analysis showed a shared genetic ancestry of the studied population with other Indian populations. This first-ever study to the best of our knowledge on sequence-based STR analysis in the central Indian population is expected to prove the use of NGS in forensic case-work and in forensic DNA laboratories.