kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species

Ioannis Mouratidis; Fotis A. Baltoumas; Nikol Chantzi; Michail Patsakis; Candace S.Y. Chan; Austin Montgomery; Maxwell A. Konnaris; Eleni Aplakidou; George C. Georgakopoulos; Anshuman Das; Dionysios V. Chartoumpekis; Jasna Kovac; Georgios A. Pavlopoulos; Ilias Georgakopoulos-Soares

Computational and Structural Biotechnology Journal (Dec 2024)

kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species

Ioannis Mouratidis,
Fotis A. Baltoumas,
Nikol Chantzi,
Michail Patsakis,
Candace S.Y. Chan,
Austin Montgomery,
Maxwell A. Konnaris,
Eleni Aplakidou,
George C. Georgakopoulos,
Anshuman Das,
Dionysios V. Chartoumpekis,
Jasna Kovac,
Georgios A. Pavlopoulos,
Ilias Georgakopoulos-Soares

Affiliations

Ioannis Mouratidis: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
Fotis A. Baltoumas: Institute for Fundamental Biomedical Research, BSRC ''Alexander Fleming'', Vari, 16672, Greece
Nikol Chantzi: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Michail Patsakis: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Candace S.Y. Chan: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
Austin Montgomery: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Maxwell A. Konnaris: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA; Department of Statistics, The Pennsylvania State University, University Park, PA, USA
Eleni Aplakidou: Institute for Fundamental Biomedical Research, BSRC ''Alexander Fleming'', Vari, 16672, Greece; Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
George C. Georgakopoulos: National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
Anshuman Das: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Dionysios V. Chartoumpekis: Service of Endocrinology, Diabetology and Metabolism, Lausanne University Hospital, Lausanne, Switzerland
Jasna Kovac: Department of Food Science, The Pennsylvania State University, University Park, PA 16802, USA
Georgios A. Pavlopoulos: Institute for Fundamental Biomedical Research, BSRC ''Alexander Fleming'', Vari, 16672, Greece; Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, 11527, Greece; Corresponding author at: Institute for Fundamental Biomedical Research, BSRC ''Alexander Fleming'', Vari, 16672, Greece.
Ilias Georgakopoulos-Soares: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; Corresponding author.

Journal volume & issue: Vol. 23
pp. 1919 – 1928

Abstract

Read online

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords