Array (Sep 2024)
Threat intelligence named entity recognition techniques based on few-shot learning
Abstract
In today’s digital and internet era, threat intelligence analysis is of paramount importance to ensure network and information security. Named Entity Recognition (NER) is a fundamental task in natural language processing, aimed at identifying and extracting specific types of named entities from text, such as person names, locations, organization names, dates, times, currencies, and more. The quality of entities determines the effectiveness of upper-layer applications such as knowledge graphs. Recently, there has been a scarcity of training data in the threat intelligence field, and single models suffer from poor generalization ability. To address this, we propose a multi-view learning model, named the Few-shot Threat Intelligence Named Entity Recognition Model (FTM). We enhance the fusion method based on FTM, and further propose the FTM-GRU (Gate Recurrent Unit) model. The FTM model is based on the Tri-training algorithm to collaboratively train three few-shot NER models, leveraging the complementary nature of different model views to enable them to capture more threat intelligence domain knowledge at the coding level.FTM-GRU improves the fusion of multiple views. FTM-GRU uses the improved GRU model structure to control the memory and forgetting of view information, and introduces a relevance calculation unit to avoid redundancy of view information while highlighting important semantic features. We label and construct a few-shot Threat Intelligence Dataset (TID), and experiments on TID as well as the publicly available National Vulnerability Database (NVD) validate the effectiveness of our model for NER in the threat intelligence domain. Experimental results demonstrate that our proposed model achieves better recognition results in the task.