Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

Sergiu Zaharia; Traian Rebedea; Stefan Trausan-Matu

doi:10.3390/app13137871

Applied Sciences (Jul 2023)

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

Sergiu Zaharia,
Traian Rebedea,
Stefan Trausan-Matu

Affiliations

Sergiu Zaharia: Faculty of Automatic Control and Computers, University Politehnica of Bucharest, 060042 Bucharest, Romania
Traian Rebedea: Faculty of Automatic Control and Computers, University Politehnica of Bucharest, 060042 Bucharest, Romania
Stefan Trausan-Matu: Faculty of Automatic Control and Computers, University Politehnica of Bucharest, 060042 Bucharest, Romania

DOI: https://doi.org/10.3390/app13137871
Journal volume & issue: Vol. 13, no. 13
p. 7871

Abstract

Read online

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular “ancestors” from the programming languages’ evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection—in our case, without using any training data with C# source code.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords