Peer Community Journal (Sep 2022)

EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes

  • Richter, Daniel J.,
  • Berney, Cédric,
  • Strassert, Jürgen F. H.,
  • Poh, Yu-Ping,
  • Herman, Emily K.,
  • Muñoz-Gómez, Sergio A.,
  • Wideman, Jeremy G.,
  • Burki, Fabien,
  • de Vargas, Colomban

DOI
https://doi.org/10.24072/pcjournal.173
Journal volume & issue
Vol. 2

Abstract

Read online

EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orphan taxa. The goal of the database is to provide a single, convenient resource for gene-based research across the spectrum of eukaryotic life, such as phylogenomics and gene family evolution. Each species is placed within the UniEuk taxonomic framework in order to facilitate downstream analyses, and each data set is associated with a unique, persistent identifier to facilitate comparison and replication among analyses. The database is regularly updated, and all versions will be permanently stored and made available via FigShare. The current version has a number of updates, notably ‘The Comparative Set’ (TCS), a reduced taxonomic set with high estimated completeness while maintaining a substantial phylogenetic breadth, which comprises 196 predicted proteomes. A BLAST web server and graphical displays of data set completeness are available at http://evocellbio.com/eukprot/. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, with the goal of building a collaborative resource that will promote research to understand eukaryotic diversity and diversification.