A survey of k-mer methods and applications in bioinformatics

Camille Moeckel; Manvita Mareboina; Maxwell A. Konnaris; Candace S.Y. Chan; Ioannis Mouratidis; Austin Montgomery; Nikol Chantzi; Georgios A. Pavlopoulos; Ilias Georgakopoulos-Soares

Computational and Structural Biotechnology Journal (Dec 2024)

A survey of k-mer methods and applications in bioinformatics

Camille Moeckel,
Manvita Mareboina,
Maxwell A. Konnaris,
Candace S.Y. Chan,
Ioannis Mouratidis,
Austin Montgomery,
Nikol Chantzi,
Georgios A. Pavlopoulos,
Ilias Georgakopoulos-Soares

Affiliations

Camille Moeckel: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Manvita Mareboina: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Maxwell A. Konnaris: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Candace S.Y. Chan: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
Ioannis Mouratidis: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
Austin Montgomery: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Nikol Chantzi: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
Georgios A. Pavlopoulos: Institute for Fundamental Biomedical Research, BSRC ''Alexander Fleming'', Vari 16672, Greece
Ilias Georgakopoulos-Soares: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA; Corresponding author at: Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.

Journal volume & issue: Vol. 23
pp. 2289 – 2303

Abstract

Read online

The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords