IEEE Access (Jan 2020)

A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

  • Hui Guo,
  • Shuguang Huang,
  • Cheng Huang,
  • Min Zhang,
  • Zulie Pan,
  • Fan Shi,
  • Hui Huang,
  • Donghui Hu,
  • Xiaoping Wang

DOI
https://doi.org/10.1109/ACCESS.2020.3004813
Journal volume & issue
Vol. 8
pp. 120501 – 120512

Abstract

Read online

The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.

Keywords