Securing Code With Context: Enhancing Vulnerability Detection Through Contextualized Graph Representations

Muhammad Fakhrur Rozi; Tao Ban; Seiichi Ozawa; Akira Yamada; Takeshi Takahashi; Daisuke Inoue

doi:10.1109/ACCESS.2024.3467180

IEEE Access (Jan 2024)

Securing Code With Context: Enhancing Vulnerability Detection Through Contextualized Graph Representations

Muhammad Fakhrur Rozi,
Tao Ban,
Seiichi Ozawa,
Akira Yamada,
Takeshi Takahashi,
Daisuke Inoue

Affiliations

Muhammad Fakhrur Rozi: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Tao Ban: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Seiichi Ozawa: ORCiD; Center for Mathematical and Data Sciences, Kobe University, Kobe, Japan
Akira Yamada: ORCiD; Graduate School of Engineering, Kobe University, Kobe, Japan
Takeshi Takahashi: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Daisuke Inoue: National Institute of Information and Communications Technology, Koganei, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3467180
Journal volume & issue: Vol. 12
pp. 142101 – 142126

Abstract

Read online

Detecting source code vulnerabilities is a critical challenge in secure software development. Early identification of vulnerabilities ensures that software performance and security remain uncompromised. However, existing vulnerability detection methods often struggle to capture the semantic meaning of source code, particularly for vulnerability types that require a deeper understanding of code flow and context. This work addresses this challenge by introducing ContextCPG, a novel enhancement of the code property graph (CPG) representation. ContextCPG augments the CPG by incorporating additional information about variable names and data types within the source code. Our approach combines natural language processing analysis with graph-based analysis to capture a richer context surrounding the source code, relying on both the structural features of the graph representation and the naming, type, and value of nodes as natural language analysis. These additional features enhance the capability of graph neural network models to capture the semantic meaning of the source code and better detect vulnerabilities. We evaluate ContextCPG by applying it to three selected C/C++ vulnerabilities (buffer overflow, invalid input, and use-after-free) and comparing its performance against CPGs. The evaluation results reveal that ContextCPG consistently outperforms the CPG on all vulnerability types, demonstrating an average accuracy increase of 8%. ContextCPG showcases the value of providing supplementary information within the graph representation, consistently enhancing vulnerability detection efficacy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords