Tongxin xuebao (Jul 2015)
HybridFA:a memory reduction technique for the AC automata based on statistics
Abstract
Despite the fast speed in multiple string matching tasks,the advanced Aho-Corasick(AC) automata wastes storage memory to a great extent.Study indicated that the automata states have specific statistical access characteristics in practice.Accordingly,a series of algorithms based on statistical characteristics for building hybrid finite automata,named HybridFA,are proposed.This work completes partial states of the AC automata according to different features,including access frequency,state hierarchy,and combined characteristics respectively.Experimental results on the real-world datasets like Snort,ClamAV,and URL show that the storage space of HybridAC is reduced to less than 5% of the space cost by the advanced AC automata.Furthermore,HybridFA based on combined characteristics achieves the superior performance on matching speed and robustness comparing to other proposed algorithms.