RNA Biology (Dec 2023)
PacBio full-length transcriptome analysis provides new insights into transcription of chloroplast genomes
Abstract
Chloroplast and mitochondrial DNA (cpDNA and mtDNA) are apart from nuclear DNA (nuDNA) in a eukaryotic cell. The transcription system of chloroplasts differs from those of mitochondria and eukaryotes. In contrast to nuDNA and animal mtDNA, the transcription of cpDNA is still not well understood, primarily due to the unresolved identification of transcription initiation sites (TISs) and transcription termination sites (TTSs) on the genome scale. In the present study, we characterized the transcription of chloroplast (cp) genes with greater accuracy and comprehensive information using PacBio full-length transcriptome data from Arabidopsis thaliana. The major findings included the discovery of four types of artifacts, the validation and correction of cp gene annotations, the exact identification of TISs that start with G, and the discovery of polyA-like sites as TTSs. Notably, we proposed a new model to explain cp transcription initiation and termination at the whole-genome level. Four types of artifacts, degraded RNAs and splicing intermediates deserve the attention from researchers working with PacBio full-length transcriptome data, as these contaminant sequences can lead to incorrect downstream analysis. Cp transcription initiates at multiple promoters and terminates at polyA-like sites. Our study provides new insights into cp transcription and new clues to study the evolution of promoters, TISs, TTSs and polyA tails of eukaryotic genes.
Keywords