Similarity-Based Source Code Vulnerability Detection Leveraging Transformer Architecture: Harnessing Cross- Attention for Hierarchical Analysis

Sungmin Han; Miju Kim; Jaesik Kang; Kwangsoo Kim; Seungwoon Lee; Sangkyun Lee

doi:10.1109/ACCESS.2024.3474857

IEEE Access (Jan 2024)

Similarity-Based Source Code Vulnerability Detection Leveraging Transformer Architecture: Harnessing Cross- Attention for Hierarchical Analysis

Sungmin Han,
Miju Kim,
Jaesik Kang,
Kwangsoo Kim,
Seungwoon Lee,
Sangkyun Lee

Affiliations

Sungmin Han: ORCiD; School of Cybersecurity, Korea University, Seoul, Republic of Korea
Miju Kim: School of Cybersecurity, Korea University, Seoul, Republic of Korea
Jaesik Kang: ORCiD; Cyber Warfare Research and Development Laboratory, LIG Nex1, Seongnam-si, Republic of Korea
Kwangsoo Kim: ORCiD; Cyber Warfare Research and Development Laboratory, LIG Nex1, Seongnam-si, Republic of Korea
Seungwoon Lee: ORCiD; Cyber Warfare Research and Development Laboratory, LIG Nex1, Seongnam-si, Republic of Korea
Sangkyun Lee: ORCiD; School of Cybersecurity, Korea University, Seoul, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3474857
Journal volume & issue: Vol. 12
pp. 150295 – 150307

Abstract

Read online

The growing complexity and volume of modern software have led to an increase in source code vulnerabilities, posing significant security risks. In response, deep learning-based automated source code vulnerability detection methods, particularly those utilizing source code similarity analysis, have recently emerged as promising solutions. However, existing similarity-based source code vulnerability detection methods frequently fail to fully utilize information from the hierarchical structure of source code and are often computationally expensive, limiting their practicality in real-world scenarios. In this paper, we introduce XTransformer, a novel deep learning-based source code vulnerability detector tailored for comparing target source code against archived vulnerable codes across various levels of the source code’s hierarchical structure by leveraging extra cross-attention imposed on the transformer architecture. Additionally, we propose a specialized training strategy based on supervised contrastive learning to improve XTransformer’s ability to effectively learn and differentiate between vulnerable and non-vulnerable source codes. Comprehensive experiments demonstrate that XTransformer outperforms current state-of-the-art methods across different datasets and code lengths while significantly reducing the inference time compared to other similarity-based methods that utilize hierarchical information from source code.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords