Jisuanji kexue (May 2023)

Code Embedding Method Based on Neural Network

  • SUN Xuekai, JIANG Liehui

DOI
https://doi.org/10.11896/jsjkx.220100094
Journal volume & issue
Vol. 50, no. 5
pp. 64 – 71

Abstract

Read online

There are many application scenarios for code analysis and research,such as code plagiarism detection and software vulnerability search.With the development of artificial intelligence,neural network technology has been widely used in code analysis and research.However,the existing methods either simply treat the code as ordinary natural language processing,or use much more complex rules to sample the code.The former processing method is easy to cause the loss of key information of the code,while the latter can make the algorithm to be too complicated,and the training of the model will take a lot of time.Alon proposed an algorithm named Code2vec,which has significant advantages compared with previous code analysis methods.But the Code2vec still has some limitations.Therefore,a code embedding method based on neural network is proposed.The main idea of this method is to express the code function as the code embedding vector.First,a code function is decomposed into a series of abstract syntax tree paths,then a neural network is used to learn how to represent each path,and finally all paths are aggregated into an embedding vector to represent the current code function.A prototype system based on this method is implemented in this paper.Experimental results show that compared with Code2vec,the new algorithm has the advantages of simpler structure and faster training speed.

Keywords