IEEE Access (Jan 2024)
On the Code Vulnerability Detection Based on Deep Learning: A Comparative Study
Abstract
Deep learning is one of the important methods to detect and fix vulnerabilities in software programs. How to represent code information and how to use artificial intelligence methods to learn and understand code semantics and other information are crucial points in this method. Vulnerability mining analysis based on source code is usually combined with compiler-related techniques to abstract program representations through lexical, syntactic, and semantic analyses, and further combined with data flow analysis, control flow analysis, static symbolic execution, and other techniques to verify the existence of vulnerabilities and identify the location of code defects. To compare the abilities of vulnerability detection methods, we first categorize vulnerability detection methods into two main types based on different intermediate representations: sequence-based and graph-based methods. And then, we further divide sequence-based methods into four categories and distinguish graph-based methods based on whether they employ slicing techniques. Following, through the analysis of specific examples, we compare the advantages and disadvantages of these two methods, and explore the differences and similarities in the neural networks they use. Lastly, we conduct a comparative analysis of the datasets used in the mentioned methods, highlight some challenges in this field, and present our thoughts on potential research directions.
Keywords