A Context-Aware Neural Embedding for Function-Level Vulnerability Detection

Hongwei Wei; Guanjun Lin; Lin Li; Heming Jia

doi:10.3390/a14110335

Algorithms (Nov 2021)

A Context-Aware Neural Embedding for Function-Level Vulnerability Detection

Hongwei Wei,
Guanjun Lin,
Lin Li,
Heming Jia

Affiliations

Hongwei Wei: School of Information Engineering, Sanming University, Sanming 365004, China
Guanjun Lin: School of Information Engineering, Sanming University, Sanming 365004, China
Lin Li: School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne 3122, Australia
Heming Jia: School of Information Engineering, Sanming University, Sanming 365004, China

DOI: https://doi.org/10.3390/a14110335
Journal volume & issue: Vol. 14, no. 11
p. 335

Abstract

Read online

Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vulnerable. We argue that a code segment is vulnerable if it exists in certain semantic contexts, such as the control flow and data flow; therefore, it is important for the detection to be context aware. In this paper, we evaluate the performance of mainstream word embedding techniques in the scenario of software vulnerability detection. Based on the evaluation, we propose a supervised framework leveraging pre-trained context-aware embeddings from language models (ELMo) to capture deep contextual representations, further summarized by a bidirectional long short-term memory (Bi-LSTM) layer for learning long-range code dependency. The framework takes directly a source code function as an input and produces corresponding function embeddings, which can be treated as feature sets for conventional ML classifiers. Experimental results showed that the proposed framework yielded the best performance in its downstream detection tasks. Using the feature representations generated by our framework, random forest and support vector machine outperformed four baseline systems on our data sets, demonstrating that the framework incorporated with ELMo can effectively capture the vulnerable data flow patterns and facilitate the vulnerability detection task.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords