Algorithms (Apr 2009)

Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors

  • Robert Preissner,
  • Janusz M. Bujnicki,
  • Thomas Steinke,
  • Peter Moor,
  • Knut Reinert,
  • Kristian Rother,
  • Raphael André Bauer

DOI
https://doi.org/10.3390/a2020692
Journal volume & issue
Vol. 2, no. 2
pp. 692 – 709

Abstract

Read online

This work presents a generalized approach for the fast structural alignment of thousands of macromolecular structures. The method uses string representations of a macromolecular structure and a hash table that stores n-grams of a certain size for searching. To this end, macromolecular structure-to-string translators were implemented for protein and RNA structures. A query against the index is performed in two hierarchical steps to unite speed and precision. In the first step the query structure is translated into n-grams, and all target structures containing these n-grams are retrieved from the hash table. In the second step all corresponding n-grams of the query and each target structure are subsequently aligned, and after each alignment a score is calculated based on the matching n-grams of query and target. The extendable framework enables the user to query and structurally align thousands of protein and RNA structures on a commodity machine and is available as open source from http://lajolla.sf.net.

Keywords