Algorithms for Molecular Biology (Jan 2010)

A tree-based method for the rapid screening of chemical fingerprints

  • Pedersen Christian NS,
  • Nielsen Jesper,
  • Kristensen Thomas G

DOI
https://doi.org/10.1186/1748-7188-5-9
Journal volume & issue
Vol. 5, no. 1
p. 9

Abstract

Read online

Abstract Background The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. Results In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods. Conclusions Using the novel kD grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to (1) perform more searches than previously possible and (2) to easily search large databases.