G3: Genes, Genomes, Genetics (Feb 2017)

De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis)

  • Si Lok,
  • Tara A. Paton,
  • Zhuozhi Wang,
  • Gaganjot Kaur,
  • Susan Walker,
  • Ryan K. C. Yuen,
  • Wilson W. L. Sung,
  • Joseph Whitney,
  • Janet A. Buchanan,
  • Brett Trost,
  • Naina Singh,
  • Beverly Apresto,
  • Nan Chen,
  • Matthew Coole,
  • Travis J. Dawson,
  • Karen Ho,
  • Zhizhou Hu,
  • Sanjeev Pullenayegum,
  • Kozue Samler,
  • Arun Shipstone,
  • Fiona Tsoi,
  • Ting Wang,
  • Sergio L. Pereira,
  • Pirooz Rostami,
  • Carol Ann Ryan,
  • Amy Hin Yan Tong,
  • Karen Ng,
  • Yogi Sundaravadanam,
  • Jared T. Simpson,
  • Burton K. Lim,
  • Mark D. Engstrom,
  • Christopher J. Dutton,
  • Kevin C. R. Kerr,
  • Maria Franke,
  • William Rapley,
  • Richard F. Wintle,
  • Stephen W. Scherer

DOI
https://doi.org/10.1534/g3.116.038208
Journal volume & issue
Vol. 7, no. 2
pp. 755 – 773

Abstract

Read online

The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.

Keywords