Tongxin xuebao (Dec 2014)

Efficient top-k string similarity query algorithms

  • Zi-yang CHEN,
  • Yu-jun HAN,
  • Xuan WANG,
  • Jun-feng ZHOU

Journal volume & issue
Vol. 35
pp. 10 – 20

Abstract

Read online

Computing top-k similar strings based on edit distance,i.e.,given a query string σ and string set S,finding k similar strings to σ based on edit distance from S.Firstly,two adaptive filter strategies based on length-skip index are proposed,such that to reduce the times of edit distance computation between two strings.Then the lower bound of edit distance between query string and unmatched string set is proposed,such that to further reduce the times of edit dis-tance computation when processing strings that do not have common signatures with the query string.Finally efficient algorithms to return top-k similar strings are proposed.Experimental results on three real datasets verify the benefits over the state-of-the-art algorithm.

Keywords