Tongxin xuebao (Dec 2016)
FilterFA: a multiple string matching algorithm based on specification of character set
Abstract
Multiple string matching is one of the core techniques of intrusion detection system, where Aho-Corasick al-gorithm is widely used. To solve the problem that huge storage overhead of AC would influence performance deeply, an improved algorithm ——FilterFA, based on specification of character set was proposed. This algorithm compressed large character by the character set mapping function, and constructed a new automata based on the mapped character set,then space complexity decreased to O(|P||Σ′|). Experiments on synthetic datasets and real-world datasets (such as ClamAV) show that the storage overhead of FilterFA is only about 3% of that of AC, while the size of the character set is 8, and the false recognition rate is less than 2%.