Chromosomal-level genome assembly data from the pale chub, Zacco platypus (Jordan &amp; Evermann, 1902)

Sang-Eun Nam; Jae-Sung Rhee

Data in Brief (Aug 2024)

Chromosomal-level genome assembly data from the pale chub, Zacco platypus (Jordan & Evermann, 1902)

Sang-Eun Nam,
Jae-Sung Rhee

Affiliations

Sang-Eun Nam: Department of Marine Science, College of Natural Sciences, Incheon National University, Incheon, 22012, South Korea
Jae-Sung Rhee: Department of Marine Science, College of Natural Sciences, Incheon National University, Incheon, 22012, South Korea; Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea; Yellow Sea Research Institute, Incheon 22012, South Korea; Corresponding author.

Journal volume & issue: Vol. 55
p. 110596

Abstract

Read online

The pale chub, Zacco platypus (Cypriniformes; Xenocyprididae; homotypic synonym: Opsariichthys platypus; Jordan & Evermann, 1902), is widely distributed in the freshwater ecosystems throughout East Asia, including South Korea. In this study, we constructed a de novo genome assembly of Z. platypus to serve as a reference for fundamental and applied research. The assembly was generated using a combination of long-read Pacific Bioscience (PacBio) sequencing, short-read Illumina sequencing, and Hi-C sequencing technologies. The draft genome of Z. platypus consisted of 16,422,113 reads from the HiFi library, 702,143,130 reads from the Illumina TruSeq library, and 250,789,660 reads from the Hi-C library. Assembly with Hifiasm resulted in 336 contigs, with an N50 length of 31.9 Mb. The final assembled genome size was 838.6 Mb. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis indicated that 3,572 (98.1 %) of the expected genes were found in the assembly, with 3,521 (96.7 %) being single-copy and 51 (1.4 %) duplicated after searching against the Actinopterygii database. Of the 319 Hi-C scaffolds, 24 exceeded 10 Mb were thus classified as chromosome-level scaffolds. The assembled genome comprises 41.45 % repeat sequences. Gene annotation was performed using Illumina RNA-Seq and PacBio Iso-Seq data, based on repeat-masked genome sequences. The final annotation resulted in 34,036 protein-coding genes. This chromosomal-level genome assembly is expected to be a valuable resource for future health assessments in aquatic ecosystems, providing insights into the developmental, environmental, and ecological aspects of Z. platypus.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords