Frontiers in Plant Science (Jun 2023)
The Viscum album Gene Space database
Abstract
The hemiparasitic flowering plant Viscum album (European mistletoe) is known for its very special life cycle, extraordinary biochemical properties, and extremely large genome. The size of its genome is estimated to be 30 times larger than the human genome and 600 times larger than the genome of the model plant Arabidopsis thaliana. To achieve insights into the Gene Space of the genome, which is defined as the space including and surrounding protein-coding regions, a transcriptome project based on PacBio sequencing has recently been conducted. A database resulting from this project contains sequences of 39,092 different open reading frames encoding 32,064 distinct proteins. Based on ‘Benchmarking Universal Single-Copy Orthologs’ (BUSCO) analysis, the completeness of the database was estimated to be in the range of 78%. To further develop this database, we performed a transcriptome project of V. album organs harvested in summer and winter based on Illumina sequencing. Data from both sequencing strategies were combined. The new V. album Gene Space database II (VaGs II) contains 90,039 sequences and has a completeness of 93% as revealed by BUSCO analysis. Sequences from other organisms, particularly fungi, which are known to colonize mistletoe leaves, have been removed. To evaluate the quality of the new database, proteome data of a mitochondrial fraction of V. album were re-analyzed. Compared to the original evaluation published five years ago, nearly 1000 additional proteins could be identified in the mitochondrial fraction, providing new insights into the Oxidative Phosphorylation System of V. album. The VaGs II database is available at https://viscumalbum.pflanzenproteomik.de/. Furthermore, all V. album sequences have been uploaded at the European Nucleotide Archive (ENA).
Keywords