Biotechnology for Biofuels (May 2017)

De novo assembly, functional annotation, and analysis of the giant reed (Arundo donax L.) leaf transcriptome provide tools for the development of a biofuel feedstock

  • Chiara Evangelistella,
  • Alessio Valentini,
  • Riccardo Ludovisi,
  • Andrea Firrincieli,
  • Francesco Fabbrini,
  • Simone Scalabrin,
  • Federica Cattonaro,
  • Michele Morgante,
  • Giuseppe Scarascia Mugnozza,
  • Joost J. B. Keurentjes,
  • Antoine Harfouche

DOI
https://doi.org/10.1186/s13068-017-0828-7
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 24

Abstract

Read online

Abstract Background Arundo donax has attracted renewed interest as a potential candidate energy crop for use in biomass-to-liquid fuel conversion processes and biorefineries. This is due to its high productivity, adaptability to marginal land conditions, and suitability for biofuel and biomaterial production. Despite its importance, the genomic resources currently available for supporting the improvement of this species are still limited. Results We used RNA sequencing (RNA-Seq) to de novo assemble and characterize the A. donax leaf transcriptome. The sequencing generated 1249 million clean reads that were assembled using single-k-mer and multi-k-mer approaches into 62,596 unique sequences (unitranscripts) with an N50 of 1134 bp. TransDecoder and Trinotate software suites were used to obtain putative coding sequences and annotate them by mapping to UniProtKB/Swiss-Prot and UniRef90 databases, searching for known transcripts, proteins, protein domains, and signal peptides. Furthermore, the unitranscripts were annotated by mapping them to the NCBI non-redundant, GO and KEGG pathway databases using Blast2GO. The transcriptome was also characterized by BLAST searches to investigate homologous transcripts of key genes involved in important metabolic pathways, such as lignin, cellulose, purine, and thiamine biosynthesis and carbon fixation. Moreover, a set of homologous transcripts of key genes involved in stomatal development and of genes coding for stress-associated proteins (SAPs) were identified. Additionally, 8364 simple sequence repeat (SSR) markers were identified and surveyed. SSRs appeared more abundant in non-coding regions (63.18%) than in coding regions (36.82%). This SSR dataset represents the first marker catalogue of A. donax. 53 SSRs (PolySSRs) were then predicted to be polymorphic between ecotype-specific assemblies, suggesting genetic variability in the studied ecotypes. Conclusions This study provides the first publicly available leaf transcriptome for the A. donax bioenergy crop. The functional annotation and characterization of the transcriptome will be highly useful for providing insight into the molecular mechanisms underlying its extreme adaptability. The identification of homologous transcripts involved in key metabolic pathways offers a platform for directing future efforts in genetic improvement of this species. Finally, the identified SSRs will facilitate the harnessing of untapped genetic diversity. This transcriptome should be of value to ongoing functional genomics and genetic studies in this crop of paramount economic importance.

Keywords