Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Lewis Steven; Csordas Attila; Killcoyne Sarah; Hermjakob Henning; Hoopmann Michael R; Moritz Robert L; Deutsch Eric W; Boyle John

doi:10.1186/1471-2105-13-324

BMC Bioinformatics (Dec 2012)

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Lewis Steven,
Csordas Attila,
Killcoyne Sarah,
Hermjakob Henning,
Hoopmann Michael R,
Moritz Robert L,
Deutsch Eric W,
Boyle John

Affiliations

Lewis Steven
Csordas Attila
Killcoyne Sarah
Hermjakob Henning
Hoopmann Michael R
Moritz Robert L
Deutsch Eric W
Boyle John

DOI: https://doi.org/10.1186/1471-2105-13-324
Journal volume & issue: Vol. 13, no. 1
p. 324

Abstract

Read online

Abstract Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal