Scientific Data (Oct 2024)
The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
Abstract
Abstract Marine microbial eukaryotes (protists) perform essential metabolic functions in oceanic ecosystems. The diversity of protist functions remains poorly understood as few species have been isolated in laboratory settings. Metatranscriptomes provide an invaluable tool for exploring protist diversity and genetic capacities within their natural habitats. Here, we introduce the North Pacific Eukaryotic Gene Catalog, a compilation of metatranscriptome data derived from a total of 261 metatranscriptomes: 169 metatranscriptomes were derived from samples collected on three meridional surface transects along 158°W, each spanning ~20 degrees of latitude from the North Pacific Subtropical Gyre (NPSG) to the North Pacific Transition Zone (NPTZ); 92 metatranscriptomes were derived from two diel-resolved field studies, one in the NPSG at 157°W, 23°N, one in the NPTZ at 158°W, 41°N. The metatranscriptome sequences were de novo assembled into 175 assemblies and pooled into five datasets each containing between 22 M and 49 M contigs clustered at 99% protein identity. Assemblies were annotated by taxonomy and function, and enumerated by short read alignment. All data are available in the Zenodo repository, with underlying code available on github.