Scientific Reports (Nov 2022)

A draft chromosome-scale genome assembly of a commercial sugarcane

  • Jeremy R. Shearman,
  • Wirulda Pootakham,
  • Chutima Sonthirod,
  • Chaiwat Naktang,
  • Thippawan Yoocha,
  • Duangjai Sangsrakru,
  • Nukoon Jomchai,
  • Sissades Tongsima,
  • Jittima Piriyapongsa,
  • Chumpol Ngamphiw,
  • Nanchaya Wanasen,
  • Kittipat Ukoskit,
  • Prapat Punpee,
  • Peeraya Klomsa-ard,
  • Klanarong Sriroth,
  • Jisen Zhang,
  • Xingtan Zhang,
  • Ray Ming,
  • Somvong Tragoonrung,
  • Sithichoke Tangphatsornruang

DOI
https://doi.org/10.1038/s41598-022-24823-0
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Sugarcane accounts for a large portion of the worlds sugar production. Modern commercial cultivars are complex hybrids of S. officinarum, S. spontaneum, and several other Saccharum species, resulting in an auto-allopolyploid with 8–12 copies of each chromosome. The current genome assembly gold standard is to generate a long read assembly followed by chromatin conformation capture sequencing to scaffold. We used the PacBio RSII and chromatin conformation capture sequencing to sequence and assemble the genome of a South East Asian commercial sugarcane cultivar, known as Khon Kaen 3. The Khon Kaen 3 genome assembled into 104,477 contigs totalling 7 Gb, which scaffolded into 56 pseudochromosomes containing 5.2 Gb of sequence. Genome annotation produced 242,406 genes from 30,927 orthogroups. Aligning the Khon Kaen 3 genome sequence to S. officinarum and S. spontaneum revealed a high level of apparent recombination, indicating a chimeric assembly. This assembly error is explained by high nucleotide identity between S. officinarum and S. spontaneum, where 91.8% of S. spontaneum aligns to S. officinarum at 94% identity. Thus, the subgenomes of commercial sugarcane are so similar that using short reads to correct long PacBio reads produced chimeric long reads. Future attempts to sequence sugarcane must take this information into account.