Simrank: Rapid and sensitive general-purpose k-mer search tool

Brodie Eoin L; Singh Navjeet NS; Alekseyenko Alexander V; Karaoz Ulas; Keller Keith; DeSantis Todd Z; Pei Zhiheng; Andersen Gary L; Larsen Niels

doi:10.1186/1472-6785-11-11

BMC Ecology (Apr 2011)

Simrank: Rapid and sensitive general-purpose k-mer search tool

Brodie Eoin L,
Singh Navjeet NS,
Alekseyenko Alexander V,
Karaoz Ulas,
Keller Keith,
DeSantis Todd Z,
Pei Zhiheng,
Andersen Gary L,
Larsen Niels

Affiliations

Brodie Eoin L
Singh Navjeet NS
Alekseyenko Alexander V
Karaoz Ulas
Keller Keith
DeSantis Todd Z
Pei Zhiheng
Andersen Gary L
Larsen Niels

DOI: https://doi.org/10.1186/1472-6785-11-11
Journal volume & issue: Vol. 11, no. 1
p. 11

Abstract

Read online

Abstract Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

Published in BMC Ecology

ISSN: 1472-6785 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Ecology
Website: https://bmcecol.biomedcentral.com

About the journal