Scientific Data (Jun 2024)

Integrating Iso-seq and RNA-seq data for the reannotation of the greater amberjack genome

  • Yuanli Zhao,
  • Zonggui Chen,
  • Meidi Hu,
  • Hairong Liu,
  • Haiping Zhao,
  • Yang Huang,
  • Mouyan Jiang,
  • Shengkang Li,
  • Guangli Li,
  • Chunhua Zhu,
  • Wei Hu,
  • Daji Luo

DOI
https://doi.org/10.1038/s41597-024-03495-7
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract The greater amberjack is a very important fishery species with high commercial value, and it is distributed worldwide. Transcriptome-based studies on S. dumerili have been limited by an inadequate reference genome and a lack of well-annotated full-length transcripts. In this study, a total of 12 tissues from juvenile and adult fish both sexes were collected for next-generation RNA sequencing (RNA-seq) and full-length isoform sequencing (Iso-seq). For Iso-seq, a total of 163,218, 149,716, and 189,169 high-quality unique transcript sequences were obtained, with an N50 of 5,441, 5,255, and 5,939, from juvenile, adult male and adult female S. dumerili, respectively. We integrated the Iso-seq and RNA-seq data to construct a comprehensive gene annotation and systematically profiled the dynamics of gene expression across the 12 tissues. Our gene models had greater detail and accuracy than those from NCBI and Ensembl, with more precise polyA locations. These resources serve as a foundation for functional genomic studies and provide valuable insights into the molecular mechanisms underlying the development, reproduction and commercial traits of amberjack.