IEEE Access (Jan 2024)
Securing Code With Context: Enhancing Vulnerability Detection Through Contextualized Graph Representations
Abstract
Detecting source code vulnerabilities is a critical challenge in secure software development. Early identification of vulnerabilities ensures that software performance and security remain uncompromised. However, existing vulnerability detection methods often struggle to capture the semantic meaning of source code, particularly for vulnerability types that require a deeper understanding of code flow and context. This work addresses this challenge by introducing ContextCPG, a novel enhancement of the code property graph (CPG) representation. ContextCPG augments the CPG by incorporating additional information about variable names and data types within the source code. Our approach combines natural language processing analysis with graph-based analysis to capture a richer context surrounding the source code, relying on both the structural features of the graph representation and the naming, type, and value of nodes as natural language analysis. These additional features enhance the capability of graph neural network models to capture the semantic meaning of the source code and better detect vulnerabilities. We evaluate ContextCPG by applying it to three selected C/C++ vulnerabilities (buffer overflow, invalid input, and use-after-free) and comparing its performance against CPGs. The evaluation results reveal that ContextCPG consistently outperforms the CPG on all vulnerability types, demonstrating an average accuracy increase of 8%. ContextCPG showcases the value of providing supplementary information within the graph representation, consistently enhancing vulnerability detection efficacy.
Keywords