Deep learning vulnerability detection method based on optimized inter-procedural semantics of programs

Yan LI, Weizhong QIANG, Zhen LI, Deqing ZOU, Hai JIN

doi:10.11959/j.issn.2096-109x.2023085

网络与信息安全学报 (Dec 2023)

Deep learning vulnerability detection method based on optimized inter-procedural semantics of programs

Yan LI, Weizhong QIANG, Zhen LI, Deqing ZOU, Hai JIN

Affiliations

Yan LI, Weizhong QIANG, Zhen LI, Deqing ZOU, Hai JIN

DOI: https://doi.org/10.11959/j.issn.2096-109x.2023085
Journal volume & issue: Vol. 9, no. 6
pp. 86 – 101

Abstract

Read online

In recent years, software vulnerabilities have been causing a multitude of security incidents, and the early discovery and patching of vulnerabilities can effectively reduce losses.Traditional rule-based vulnerability detection methods, relying upon rules defined by experts, suffer from a high false negative rate.Deep learning-based methods have the capability to automatically learn potential features of vulnerable programs.However, as software complexity increases, the precision of these methods decreases.On one hand, current methods mostly operate at the function level, thus unable to handle inter-procedural vulnerability samples.On the other hand, models such as BGRU and BLSTM exhibit performance degradation when confronted with long input sequences, and are not adept at capturing long-term dependencies in program statements.To address the aforementioned issues, the existing program slicing method has been optimized, enabling a comprehensive contextual analysis of vulnerabilities triggered across functions through the combination of intra-procedural and inter-procedural slicing.This facilitated the capture of the complete causal relationship of vulnerability triggers.Furthermore, a vulnerability detection task was conducted using a Transformer neural network architecture equipped with a multi-head attention mechanism.This architecture collectively focused on information from different representation subspaces, allowing for the extraction of deep features from nodes.Unlike recurrent neural networks, this approach resolved the issue of information decay and effectively learned the syntax and semantic information of the source program.Experimental results demonstrate that this method achieves an F1 score of 73.4% on a real software dataset.Compared to the comparative methods, it shows an improvement of 13.6% to 40.8%.Furthermore, it successfully detects several vulnerabilities in open-source software, confirming its effectiveness and applicability.

Published in 网络与信息安全学报

ISSN: 2096-109X (Print)
Publisher: POSTS&TELECOM PRESS Co., LTD
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.infocomm-journal.com/cjnis/CN/2096-109X/home.shtml

About the journal

Abstract

Keywords