PLoS ONE (Jan 2022)
Transcriptomic complexity of the human malaria parasite Plasmodium falciparum revealed by long-read sequencing.
Abstract
The Plasmodium falciparum human malaria parasite genome is incompletely annotated and does not accurately represent the transcriptomic diversity of this species. To address this need, we performed long-read transcriptomic sequencing. 5' capped mRNA was enriched from samples of total and nuclear-fractionated RNA from intra-erythrocytic stages and converted to cDNA library. The cDNA libraries were sequenced on PacBio and Nanopore long-read platforms. 12,495 novel isoforms were annotated from the data. Alternative 5' and 3' ends represent the majority of isoform events among the novel isoforms, with retained introns being the next most common event. The majority of alternative 5' ends correspond to genomic regions with features similar to those of the reference transcript 5' ends. However, a minority of alternative 5' ends showed markedly different features, including locations within protein-coding regions. Alternative 3' ends showed similar features to the reference transcript 3' ends, notably adenine-rich termination signals. Distinguishing features of retained introns could not be observed, except for a tendency towards shorter length and greater GC content compared with spliced introns. Expression of antisense and retained intron isoforms was detected at different intra-erythrocytic stages, suggesting developmental regulation of these isoform events. To gain insights into the possible functions of the novel isoforms, their protein-coding potential was assessed. Variants of P. falciparum proteins and novel proteins encoded by alternative open reading frames suggest that P. falciparum has a greater proteomic repertoire than the current annotation. We provide a catalog of annotated transcripts and encoded alternative proteins to support further studies on gene and protein regulation of this pathogen.