Efficient top-k string similarity query algorithms

Zi-yang CHEN; Yu-jun HAN; Xuan WANG; Jun-feng ZHOU

Tongxin xuebao (Dec 2014)

Efficient top-k string similarity query algorithms

Zi-yang CHEN,
Yu-jun HAN,
Xuan WANG,
Jun-feng ZHOU

Affiliations

Zi-yang CHEN
Yu-jun HAN
Xuan WANG
Jun-feng ZHOU

Journal volume & issue: Vol. 35
pp. 10 – 20

Abstract

Read online

Computing top-k similar strings based on edit distance,i.e.,given a query string σ and string set S,finding k similar strings to σ based on edit distance from S.Firstly,two adaptive filter strategies based on length-skip index are proposed,such that to reduce the times of edit distance computation between two strings.Then the lower bound of edit distance between query string and unmatched string set is proposed,such that to further reduce the times of edit dis-tance computation when processing strings that do not have common signatures with the query string.Finally efficient algorithms to return top-k similar strings are proposed.Experimental results on three real datasets verify the benefits over the state-of-the-art algorithm.

string similarity;asymmetric signature scheme;length-skip index

Published in Tongxin xuebao

ISSN: 1000-436X (Print)
Publisher: Editorial Department of Journal on Communications
Country of publisher: China
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://www.infocomm-journal.com/txxb/EN/1000-436X/home.shtml

About the journal

Abstract

Keywords