IEEE Access (Jan 2020)
Quantifying the Significance and Relevance of Cyber-Security Text Through Textual Similarity and Cyber-Security Knowledge Graph
Abstract
In order to proactively mitigate cyber-security risks, security analysts have to continuously monitor sources of threat information. However, the sheer amount of textual information that needs to be processed is overwhelming, and it requires a great deal of mundane labor to separate the threats from the noise. We propose a novel approach to represent the relevance and significance of the cyber-security text in quantitative numbers. We trained custom Named Entity Recognition (NER) model and constructed a Cyber-security Knowledge Graph (CKG) to infer the subjective relevance of the cyber-security text to the user and to generate correlation features. In addition, the significance of the given text was analyzed in terms of its textual similarity with different repositories of pre-defined “significant” text and the maximum similarities were computed. These analysis results then act as features of the classifier to generate the significance score. The experimental result showed that the overall system could determine the significance and relevance of the text within a controlled environment with 88% accuracy.
Keywords