Frontiers in Genetics (Jan 2019)

Widespread Separation of the Polypyrimidine Tract From 3′ AG by G Tracts in Association With Alternative Exons in Metazoa and Plants

  • Hai Nguyen,
  • Hai Nguyen,
  • Jiuyong Xie

DOI
https://doi.org/10.3389/fgene.2018.00741
Journal volume & issue
Vol. 9

Abstract

Read online

At the end of introns, the polypyrimidine tract (Py) is often close to the 3′ AG in a consensus (Y)20NCAGgt in humans. Interestingly, we have found that they could also be separated by purine-rich elements including G tracts in thousands of human genes. These regulatory elements between the Py and 3′ AG (REPA) mainly regulate alternative 3′ splice sites (3′ SS) and intron retention. Here we show their widespread distribution and special properties across kingdoms. The purine-rich 3′ SS are found in up to about 60% of the introns among more than 1,000 species/lineages by whole genome analysis, and up to 18% of these introns contain the REPA G-tracts (REPAG) in about 0.6 million of 3′ SS in total. In particular, they are significantly enriched over their 3′ SS and genome backgrounds in metazoa and plants, and highly associated with alternative splicing of genes in diverse functional clusters. Cryptic splice sites harboring such G- and the other purine-triplets tend to be enriched (2–9 folds over the disrupted canonical 3′ SS) and aberrantly used in cancer patients carrying mutations of the SF3B1 or U2AF35, factors critical for branch point (BP) or 3′ AG recognition, respectively. Moreover, the REPAGs are significantly associated with reduced occurrences of BP motifs between the −24 and −4 positions, in particular absent between the −7 and −5 positions in several model organisms examined. The more distant BPs are associated with increased occurrences of alternative splicing in humans and zebrafish. The REPAGs appear to have evolved in a species- or phylum-specific way. Thus, there is widespread separation of the Py and 3′ AG by REPAGs that have evolved differentially. This special 3′ SS arrangement likely contributes to the generation of diverse transcript or protein isoforms in biological functions or diseases through alternative or aberrant splicing.

Keywords