Predicting CVSS Metric via Description Interpretation

Joana Cabral Costa; Tiago Roxo; Joao B. F. Sequeiros; Hugo Proenca; Pedro R. M. Inacio

doi:10.1109/ACCESS.2022.3179692

IEEE Access (Jan 2022)

Predicting CVSS Metric via Description Interpretation

Joana Cabral Costa,
Tiago Roxo,
Joao B. F. Sequeiros,
Hugo Proenca,
Pedro R. M. Inacio

Affiliations

Joana Cabral Costa: ORCiD; Department of Computer Science, Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal
Tiago Roxo: ORCiD; Department of Computer Science, Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal
Joao B. F. Sequeiros: ORCiD; Department of Computer Science, Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal
Hugo Proenca: ORCiD; Department of Computer Science, Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal
Pedro R. M. Inacio: ORCiD; Department of Computer Science, Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal

DOI: https://doi.org/10.1109/ACCESS.2022.3179692
Journal volume & issue: Vol. 10
pp. 59125 – 59134

Abstract

Read online

Cybercrime affects companies worldwide, costing millions of dollars annually. The constant increase of threats and vulnerabilities raises the need to handle vulnerabilities in a prioritized manner. This prioritization can be achieved through Common Vulnerability Scoring System (CVSS), typically used to assign a score to a vulnerability. However, there is a temporal mismatch between the vulnerability finding and score assignment, which motivates the development of approaches to aid in this aspect. We explore the use of Natural Language Processing (NLP) models in CVSS score prediction given vulnerability descriptions. We start by creating a vulnerability dataset from the National Vulnerability Database (NVD). Then, we combine text pre-processing and vocabulary addition to improve the model accuracy and interpret its prediction reasoning by assessing word importance, via Shapley values. Experiments show that the combination of Lemmatization and 5,000-word addition is optimal for DistilBERT, the outperforming model in our experiments of the NLP methods, achieving state-of-the-art results. Furthermore, specific events (such as an attack on a known software) tend to influence model prediction, which may hinder CVSS prediction. Combining Lemmatization with vocabulary addition mitigates this effect, contributing to increased accuracy. Finally, binary classes benefit the most from pre-processing techniques, particularly when one class is much more prominent than the other. Our work demonstrates that DistilBERT is a state-of-the-art model for CVSS prediction, demonstrating the applicability of deep learning approaches to aid in vulnerability handling. The code and data are available at https://github.com/Joana-Cabral/CVSS_Prediction.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords