Data in Brief (Aug 2024)
Chromosomal-level genome assembly data from the pale chub, Zacco platypus (Jordan & Evermann, 1902)
Abstract
The pale chub, Zacco platypus (Cypriniformes; Xenocyprididae; homotypic synonym: Opsariichthys platypus; Jordan & Evermann, 1902), is widely distributed in the freshwater ecosystems throughout East Asia, including South Korea. In this study, we constructed a de novo genome assembly of Z. platypus to serve as a reference for fundamental and applied research. The assembly was generated using a combination of long-read Pacific Bioscience (PacBio) sequencing, short-read Illumina sequencing, and Hi-C sequencing technologies. The draft genome of Z. platypus consisted of 16,422,113 reads from the HiFi library, 702,143,130 reads from the Illumina TruSeq library, and 250,789,660 reads from the Hi-C library. Assembly with Hifiasm resulted in 336 contigs, with an N50 length of 31.9 Mb. The final assembled genome size was 838.6 Mb. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis indicated that 3,572 (98.1 %) of the expected genes were found in the assembly, with 3,521 (96.7 %) being single-copy and 51 (1.4 %) duplicated after searching against the Actinopterygii database. Of the 319 Hi-C scaffolds, 24 exceeded 10 Mb were thus classified as chromosome-level scaffolds. The assembled genome comprises 41.45 % repeat sequences. Gene annotation was performed using Illumina RNA-Seq and PacBio Iso-Seq data, based on repeat-masked genome sequences. The final annotation resulted in 34,036 protein-coding genes. This chromosomal-level genome assembly is expected to be a valuable resource for future health assessments in aquatic ecosystems, providing insights into the developmental, environmental, and ecological aspects of Z. platypus.