Современные информационные технологии и IT-образование (May 2020)
Constrained Approximate Search Algorithms in Knowledge Discovery
Abstract
Knowledge discovery in big data is one of the most important applications of computing machinery today. Search is essential part of all such procedures. Search algorithms must be extremely efficient, but at the same time knowledge discovery procedures must not produce too many false positives or false negatives. False positives require post-processing, which reduces the overall efficiency of the knowledge discovery procedures, while false negatives reduce the sensitivity of such procedures. To reduce the false positive and false negative rate, in this paper, constrained approximate search algorithms are proposed to be applied. An overview of search theory, exact and approximate, is given first, exposing fundamentals of dynamic programming-based and bit-parallel-based approximate search algorithms without constraints. Then, introduction of constraints specific for various knowledge discovery procedures is explained, together with the subtleties of various applications, such as SPAM filtering and digital and network forensics (file carving, intrusion detection in hosts and networks). Advantages and disadvantages of applications of such constrained search algorithms in knowledge discovery procedures are also discussed. A potential application in bioinformatics is outlined.
Keywords