Vulnerable Code Detection Using Software Metrics and Machine Learning

Nadia Medeiros; Naghmeh Ivaki; Pedro Costa; Marco Vieira

doi:10.1109/ACCESS.2020.3041181

IEEE Access (Jan 2020)

Vulnerable Code Detection Using Software Metrics and Machine Learning

Nadia Medeiros,
Naghmeh Ivaki,
Pedro Costa,
Marco Vieira

Affiliations

Nadia Medeiros: Department of Informatics Engineering, Centre for Informatics and Systems (CISUC), University of Coimbra, Coimbra, Portugal
Naghmeh Ivaki: ORCiD; Department of Informatics Engineering, Centre for Informatics and Systems (CISUC), University of Coimbra, Coimbra, Portugal
Pedro Costa: ORCiD; Department of Informatics Engineering, Centre for Informatics and Systems (CISUC), University of Coimbra, Coimbra, Portugal
Marco Vieira: ORCiD; Department of Informatics Engineering, Centre for Informatics and Systems (CISUC), University of Coimbra, Coimbra, Portugal

DOI: https://doi.org/10.1109/ACCESS.2020.3041181
Journal volume & issue: Vol. 8
pp. 219174 – 219198

Abstract

Read online

Software metrics are widely-used indicators of software quality and several studies have shown that such metrics can be used to estimate the presence of vulnerabilities in the code. In this paper, we present a comprehensive experiment to study how effective software metrics can be to distinguish the vulnerable code units from the non-vulnerable ones. To this end, we use several machine learning algorithms (Random Forest, Extreme Boosting, Decision Tree, SVM Linear, and SVM Radial) to extract vulnerability-related knowledge from software metrics collected from the source code of several representative software projects developed in C/C++ (Mozilla Firefox, Linux Kernel, Apache HTTPd, Xen, and Glibc). We consider different combinations of software metrics and diverse application scenarios with different security concerns (e.g., highly critical or non-critical systems). This experiment contributes to understanding whether software metrics can effectively be used to distinguish vulnerable code units in different application scenarios, and how can machine learning algorithms help in this regard. The main observation is that using machine learning algorithms on top of software metrics helps to indicate vulnerable code units with a relatively high level of confidence for security-critical software systems (where the focus is on detecting the maximum number of vulnerabilities, even if false positives are reported), but they are not helpful for low-critical or non-critical systems due to the high number of false positives (that bring an additional development cost frequently not affordable).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords