IEEE Access (Jan 2023)

Binary Code Vulnerability Detection Based on Multi-Level Feature Fusion

  • Guangli Wu,
  • Huili Tang

DOI
https://doi.org/10.1109/ACCESS.2023.3289001
Journal volume & issue
Vol. 11
pp. 63904 – 63915

Abstract

Read online

The existence of software vulnerabilities will cause serious network attacks and information leakage problems. Timely and accurate detection of vulnerabilities in software has become a research focus on the security field. Most existing work only considers instruction-level features, which to some extent overlooks certain syntax and semantic information in the assembly code segments, affecting the accuracy of the detection model. In this paper, we propose a binary code vulnerability detection model based on multi-level feature fusion. The model considers both word-level features and instruction-level features. In order to solve the problem that traditional text embedding methods cannot handle polysemy, this paper uses the Embeddings from Language Models (ELMo) model to obtain dynamic word vectors containing word semantics and other information. Considering the grammatical structure in the assembly code segment, the model randomly embeds the normalized assembly code segment to represent it. Then the model uses bidirectional Gated Recurrent Unit (GRU) to extract word-level sequence features and instruction-level sequence features respectively. Then, the weighted feature fusion method is used to study the impact of different sequence features on the model performance. During model training, adding standard deviation regularization to constrain model parameters can prevent the occurrence of overfitting problems. To evaluate our proposed method, we conduct experiments on two datasets. Our method achieves an F1-score of 98.9 percent on the Juliet Test Suite dataset and a F1-score of 87.7 percent on the NDSS18 (Whole) dataset. The experimental results show that the model can improve the accuracy of binary code vulnerability detection.

Keywords