IEEE Access (Jan 2023)
Enhancing Cyber Threat Identification in Open-Source Intelligence Feeds Through an Improved Semi-Supervised Generative Adversarial Learning Approach With Contrastive Learning
Abstract
Responding to the challenge of efficiently leveraging Open-source Threat Intelligence Feeds (OTIFs) to enhance organizational security, this paper presents an innovative approach to automated threat identification using machine learning. Current machine learning models require large volumes of high-quality annotated data, which is a major obstacle given the inherent scarcity of such data in OTIFs. Bridging this research gap, we propose a novel semi-supervised learning strategy that capitalizes on both labeled and unlabeled data to automate threat identification. Our unique contribution is an advanced iteration of the GAN-BERT framework, which incorporates a self-supervised contrastive objective function to fine-tune the BERT language model. Our experiments show that this method outperforms the original BERT and GAN-BERT, achieving a 3-12% F1-score improvement on various OTIFs datasets. Furthermore, we present an efficient method for selecting hard negatives during training, which further enhances our model’s performance. This innovative approach significantly advances the field of automated threat detection, reducing reliance on human supervision and effectively addressing the issue of limited annotated data. Thus, it offers a robust solution for strengthening security posture through proactive decision making in Cyber Threat Intelligence.
Keywords