PLoS ONE (Jan 2014)
Tandem repeats and G-rich sequences are enriched at human CNV breakpoints.
Abstract
Chromosome breakage in germline and somatic genomes gives rise to copy number variation (CNV) responsible for genomic disorders and tumorigenesis. DNA sequence is known to play an important role in breakage at chromosome fragile sites; however, the sequences susceptible to double-strand breaks (DSBs) underlying CNV formation are largely unknown. Here we analyze 140 germline CNV breakpoints from 116 individuals to identify DNA sequences enriched at breakpoint loci compared to 2800 simulated control regions. We find that, overall, CNV breakpoints are enriched in tandem repeats and sequences predicted to form G-quadruplexes. G-rich repeats are overrepresented at terminal deletion breakpoints, which may be important for the addition of a new telomere. Interstitial deletions and duplication breakpoints are enriched in Alu repeats that in some cases mediate non-allelic homologous recombination (NAHR) between the two sides of the rearrangement. CNV breakpoints are enriched in certain classes of repeats that may play a role in DNA secondary structure, DSB susceptibility and/or DNA replication errors.