Plant Methods (Jan 2020)

Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics

  • Paul G. Nevill,
  • Xiao Zhong,
  • Julian Tonti-Filippini,
  • Margaret Byrne,
  • Michael Hislop,
  • Kevin Thiele,
  • Stephen van Leeuwen,
  • Laura M. Boykin,
  • Ian Small

DOI
https://doi.org/10.1186/s13007-019-0534-5
Journal volume & issue
Vol. 16, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.

Keywords