BMC Genomics (Jul 2017)

Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828

  • Timothy D. Read,
  • Robert A. Petit,
  • Sandeep J. Joseph,
  • Md. Tauqeer Alam,
  • M. Ryan Weil,
  • Maida Ahmad,
  • Ravila Bhimani,
  • Jocelyn S. Vuong,
  • Chad P. Haase,
  • D. Harry Webb,
  • Milton Tan,
  • Alistair D. M. Dove

DOI
https://doi.org/10.1186/s12864-017-3926-9
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species. Therefore, it is also the largest extant species of the paraphyletic assemblage commonly referred to as fishes. As both a phenotypic extreme and a member of the group Chondrichthyes – the sister group to the remaining gnathostomes, which includes all tetrapods and therefore also humans – its genome is of substantial comparative interest. Whale sharks are also listed as an endangered species on the International Union for Conservation of Nature’s Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. Results We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which yielded a draft assembly of 1,213,200 contigs and 997,976 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the holocephalan elephant shark. The whale shark contained a novel Toll-like-receptor (TLR) protein with sequence similarity to both the TLR4 and TLR13 proteins of mammals and TLR21 of teleosts. The data are publicly available on GenBank, FigShare, and from the NCBI Short Read Archive under accession number SRP044374. Conclusions This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.

Keywords