Microsatellites used in forensics are in regions enriched for trait-associated variants
Vivian Link,
Yuómi Jhony A. Zavaleta,
Rochelle-Jan Reyes,
Linda Ding,
Judy Wang,
Rori V. Rohlfs,
Michael D. Edge
Affiliations
Vivian Link
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Yuómi Jhony A. Zavaleta
Department of Biology, San Francisco State University, San Francisco, CA, USA
Rochelle-Jan Reyes
Department of Biology, San Francisco State University, San Francisco, CA, USA
Linda Ding
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Judy Wang
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Rori V. Rohlfs
Department of Biology, San Francisco State University, San Francisco, CA, USA; Department of Data Science and Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA; Corresponding author
Michael D. Edge
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Corresponding author
Summary: The 20 short tandem repeat (STR) loci of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS loci are thought to contain little information about ancestry or traits. However, in the past 20 years, a growing field has identified hundreds of thousands of genotype-trait associations. Here, we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. Although this study cannot establish or quantify associations between CODIS genotypes and phenotypes, we find that the regions around the CODIS loci are enriched for both known pathogenic variants (> 90th percentile) and for trait-associated SNPs identified in genome-wide association studies (GWAS) (≥ 95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs.