HGG Advances (Apr 2025)
Genome-wide maps of highly-similar intrachromosomal repeats that can mediate ectopic recombination in three human genome assemblies
Abstract
Summary: Repeated sequences spread throughout the genome play important roles in shaping the structure of chromosomes and facilitating the generation of new genomic variation through structural rearrangements. Several mechanisms of structural variation formation use shared nucleotide similarity between repeated sequences as substrate for ectopic recombination. We performed genome-wide analyses of direct and inverted intrachromosomal repeated sequence pairs with 200 bp or more and 80% or greater sequence identity in three human genome assemblies, GRCh37, GRCh38, and T2T-CHM13. Overall, the composition and distribution of direct and inverted repeated sequences identified was similar among the three assemblies involving 13%–15% of the haploid genome, with an increased, albeit not significant, number of repeated sequences in T2T-CHM13. Interestingly, the majority of repeated sequences are below 1 kb in length with a median of 84.2% identity, highlighting the potential relevance of smaller, less identical repeats, such as Alu-Alu pairs, for ectopic recombination. We cross-referenced the identified repeated sequences with protein-coding genes to identify those at risk for being involved in genomic rearrangements. Olfactory receptors and immune response genes were enriched among those impacted.