IEEE Access (Jan 2019)

A Comparative Study of Deep Learning-Based Vulnerability Detection System

  • Zhen Li,
  • Deqing Zou,
  • Jing Tang,
  • Zhihao Zhang,
  • Mingqian Sun,
  • Hai Jin

DOI
https://doi.org/10.1109/ACCESS.2019.2930578
Journal volume & issue
Vol. 7
pp. 103184 – 103197

Abstract

Read online

Source code static analysis has been widely used to detect vulnerabilities in the development of software products. The vulnerability patterns purely based on human experts are laborious and error prone, which has motivated the use of machine learning for vulnerability detection. In order to relieve human experts of defining vulnerability rules or features, a recent study shows the feasibility of leveraging deep learning to detect vulnerabilities automatically. However, the impact of different factors on the effectiveness of vulnerability detection is unknown. In this paper, we collect two datasets from the programs involving 126 types of vulnerabilities, on which we conduct the first comparative study to quantitatively evaluate the impact of different factors on the effectiveness of vulnerability detection. The experimental results show that accommodating control dependency can increase the overall effectiveness of vulnerability detection F1-measure by 20.3%; the imbalanced data processing methods are not effective for the dataset we create; bidirectional recurrent neural networks (RNNs) are more effective than unidirectional RNNs and convolutional neural network, which in turn are more effective than multi-layer perception; using the last output corresponding to the time step for the bidirectional long short-term memory (BLSTM) can reduce the false negative rate by 2.0% at the price of increasing the false positive rate by 0.5%.

Keywords