Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction

Ilias Kalouptsoglou; Miltiadis Siavvas; Dionysios Kehagias; Alexandros Chatzigeorgiou; Apostolos Ampatzoglou

doi:10.3390/e24050651

Entropy (May 2022)

Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction

Ilias Kalouptsoglou,
Miltiadis Siavvas,
Dionysios Kehagias,
Alexandros Chatzigeorgiou,
Apostolos Ampatzoglou

Affiliations

Ilias Kalouptsoglou: Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece
Miltiadis Siavvas: Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece
Dionysios Kehagias: Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece
Alexandros Chatzigeorgiou: Department of Applied Informatics, University of Macedonia, 54636 Thessaloniki, Greece
Apostolos Ampatzoglou: Department of Applied Informatics, University of Macedonia, 54636 Thessaloniki, Greece

DOI: https://doi.org/10.3390/e24050651
Journal volume & issue: Vol. 24, no. 5
p. 651

Abstract

Read online

Software security is a very important aspect for software development organizations who wish to provide high-quality and dependable software to their consumers. A crucial part of software security is the early detection of software vulnerabilities. Vulnerability prediction is a mechanism that facilitates the identification (and, in turn, the mitigation) of vulnerabilities early enough during the software development cycle. The scientific community has recently focused a lot of attention on developing Deep Learning models using text mining techniques for predicting the existence of vulnerabilities in software components. However, there are also studies that examine whether the utilization of statically extracted software metrics can lead to adequate Vulnerability Prediction Models. In this paper, both software metrics- and text mining-based Vulnerability Prediction Models are constructed and compared. A combination of software metrics and text tokens using deep-learning models is examined as well in order to investigate if a combined model can lead to more accurate vulnerability prediction. For the purposes of the present study, a vulnerability dataset containing vulnerabilities from real-world software products is utilized and extended. The results of our analysis indicate that text mining-based models outperform software metrics-based models with respect to their F2-score, whereas enriching the text mining-based models with software metrics was not found to provide any added value to their predictive performance.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords