IEEE Access (Jan 2021)

Pattern Matching in YARA: Improved Aho-Corasick Algorithm

  • Dominika Regeciova,
  • Dusan Kolar,
  • Marek Milkovic

DOI
https://doi.org/10.1109/ACCESS.2021.3074801
Journal volume & issue
Vol. 9
pp. 62857 – 62866

Abstract

Read online

YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and their implementation. We have, in fact, discovered several reasons why regular expressions can slow down scanning based on the nature of the used algorithm, Aho-Corasick. We have proposed a new version of this algorithm and have implemented it in the original version of this tool. The experiments are presented, proving that the speed of pattern matching with regular expressions can indeed be improved. In selected cases, the proposed version was about 27% faster than the original version. And in instances where strings were optimized for the original version, their speed was found to be comparable.

Keywords