Jisuanji kexue (Oct 2021)

Neural Network-based Binary Function Similarity Detection

  • FANG Lei, WEI Qiang, WU Ze-hui, DU Jiang, ZHANG Xing-ming

DOI
https://doi.org/10.11896/jsjkx.200900185
Journal volume & issue
Vol. 48, no. 10
pp. 286 – 293

Abstract

Read online

Binary code similarity detection has extensive and important applications in program traceability and security audit.In recent years,the application of neural network technology in binary code similarity detection has broken through the performance bottleneck encountered by traditional detection techno-logy in large-scale detection tasks,making code similarity detection technology based on neural network embedding gradually become a research hotspot.This paper proposes a neural network-based binary function similarity detection technology.This paper first uses a uniform intermediate representation to eliminate the diffe-rences in instruction architecture of assembly code.Secondly,at the basic block level,it uses a word embedding model in natural language processing to learn the intermediate representation code and obtain the basic block semantic embedding.Then,at the function level,it uses an improved graph neural network model to learn the control flow information of the function,taking consideration of the basic block semantics at the same time,and to obtain the final function embedding.Finally,the similarity between two functions is measured by calculating the cosine distance between the two function embeddingvectors.This paper also implements a prototype system based on this technology.Experiments show that the program code representation learning process of this technology can avoid the introduction of human bias,the improved graph neural network is more suitable for learning the control flow information of functions,and the scalability and detection accuracy of our system are both improved,compared with the existing schemes.

Keywords