Telfor Journal (Jul 2020)

Comparing Assembler Procedures by Analyzing Sequences of Opcodes

  • N. Pejić,
  • M. Cvetanović,
  • Z. Radivojević

DOI
https://doi.org/10.5937/telfor2001046P
Journal volume & issue
Vol. 12, no. 1
pp. 46 – 49

Abstract

Read online

Static analysis of executables for the purpose of comparing them can be made more difficult if the binaries are created using different compilers. In order to compensate for the noise introduced by the compilers, the arguments of the instructions are usually discarded as having a low signal-to-noise ratio. As compiler can often apply instruction reordering, some approaches only compare statistical information about the instructions, or compare their subsequences in order to measure their similarity. This paper presents an approach for estimating the similarity of procedures given in assembler form (disassembled binaries) by analyzing their sequences of opcodes. The approach first encodes the opcodes into integer values by mapping opcodes that represent similar actions into the same values, and then calculates a relative Levenshtein distance between the two sequences of integers. The proposed approach is evaluated and compared with some existing approaches, where it showed to have on average around 6% higher recall than the second-best approach.

Keywords