Scientific Reports (Oct 2024)

Comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection

  • Taher Al-Shehari,
  • Mohammed Kadrie,
  • Mohammed Nasser Al-Mhiqani,
  • Taha Alfakih,
  • Hussain Alsalman,
  • Mueen Uddin,
  • Syed Sajid Ullah,
  • Abdulhalim Dandoush

DOI
https://doi.org/10.1038/s41598-024-73510-9
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Insider threats pose a significant challenge in cybersecurity, demanding advanced detection methods for effective risk mitigation. This paper presents a comparative evaluation of data imbalance addressing techniques for CNN-based insider threat detection. Specifically, we integrate Convolutional Neural Networks (CNN) with three popular data imbalance addressing techniques: Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, and Adaptive Synthetic Sampling (ADASYN). The objective is to enhance insider threat detection accuracy and robustness in imbalanced datasets common to cybersecurity domains. Our study addresses the lack of consensus in the literature regarding the superiority of data imbalance addressing techniques in this field. We analyze a human behavior-based dataset (i.e., CERT) that reports users’ Information Technology (IT) activities with a substantial number of samples to provide a clear conclusion on the effectiveness of these balancing techniques when coupled with CNN. Experimental results demonstrate that ADASYN, in conjunction with CNN, achieves a ROC curve of 96%, surpassing SMOTE and Borderline-SMOTE in enhancing detection accuracy in imbalanced datasets. We compare the results of these three hybrid models (CNN + imbalance addressing techniques) with state-of-the-art selective studies focusing on ROC, recall, and accuracy measures. Our findings contribute to the advancement of insider threat detection methodologies.

Keywords