Nature Communications (Dec 2024)

Short tandem repeats delineate gene bodies across eukaryotes

  • William B. Reinar,
  • Anders K. Krabberød,
  • Vilde O. Lalun,
  • Melinka A. Butenko,
  • Kjetill S. Jakobsen

DOI
https://doi.org/10.1038/s41467-024-55276-w
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Short tandem repeats (STRs) have emerged as important and hypermutable sites where genetic variation correlates with gene expression in plant and animal systems. Recently, it has been shown that a broad range of transcription factors (TFs) are affected by STRs near or in the DNA target binding site. Despite this, the distribution of STR motif repetitiveness in eukaryote genomes is still largely unknown. Here, we identify monomer and dimer STR motif repetitiveness in 5.1 billion 10-bp windows upstream of translation starts and downstream of translation stops in 25 million genes spanning 1270 species across the eukaryotic Tree of Life. We report that all surveyed genomes have gene-proximal shifts in motif repetitiveness. Within genomes, variation in gene-proximal repetitiveness landscapes correlated to the function of genes; genes with housekeeping functions were depleted in upstream and downstream repetitiveness. Furthermore, the repetitiveness landscapes correlated with TF binding sites, indicating that gene function has evolved in conjunction with cis-regulatory STRs and TFs that recognize repetitive sites. These results suggest that the hypermutability inherent to STRs is canalized along the genome sequence and contributes to regulatory and eco-evolutionary dynamics in all eukaryotes.