A Comparative Study of Deep Learning-Based Vulnerability Detection System

Zhen Li; Deqing Zou; Jing Tang; Zhihao Zhang; Mingqian Sun; Hai Jin

doi:10.1109/ACCESS.2019.2930578

IEEE Access (Jan 2019)

A Comparative Study of Deep Learning-Based Vulnerability Detection System

Zhen Li,
Deqing Zou,
Jing Tang,
Zhihao Zhang,
Mingqian Sun,
Hai Jin

Affiliations

Zhen Li: National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Big Data Security Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China
Deqing Zou: ORCiD; National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Big Data Security Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China
Jing Tang: National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Big Data Security Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China
Zhihao Zhang: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Mingqian Sun: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Hai Jin: ORCiD; National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Big Data Security Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China

DOI: https://doi.org/10.1109/ACCESS.2019.2930578
Journal volume & issue: Vol. 7
pp. 103184 – 103197

Abstract

Read online

Source code static analysis has been widely used to detect vulnerabilities in the development of software products. The vulnerability patterns purely based on human experts are laborious and error prone, which has motivated the use of machine learning for vulnerability detection. In order to relieve human experts of defining vulnerability rules or features, a recent study shows the feasibility of leveraging deep learning to detect vulnerabilities automatically. However, the impact of different factors on the effectiveness of vulnerability detection is unknown. In this paper, we collect two datasets from the programs involving 126 types of vulnerabilities, on which we conduct the first comparative study to quantitatively evaluate the impact of different factors on the effectiveness of vulnerability detection. The experimental results show that accommodating control dependency can increase the overall effectiveness of vulnerability detection F1-measure by 20.3%; the imbalanced data processing methods are not effective for the dataset we create; bidirectional recurrent neural networks (RNNs) are more effective than unidirectional RNNs and convolutional neural network, which in turn are more effective than multi-layer perception; using the last output corresponding to the time step for the bidirectional long short-term memory (BLSTM) can reduce the false negative rate by 2.0% at the price of increasing the false positive rate by 0.5%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords