Optimising Insider Threat Prediction: Exploring BiLSTM Networks and Sequential Features

Phavithra Manoharan; Wei Hong; Jiao Yin; Hua Wang; Yanchun Zhang; Wenjie Ye

doi:10.1007/s41019-024-00260-z

Data Science and Engineering (Nov 2024)

Optimising Insider Threat Prediction: Exploring BiLSTM Networks and Sequential Features

Phavithra Manoharan,
Wei Hong,
Jiao Yin,
Hua Wang,
Yanchun Zhang,
Wenjie Ye

Affiliations

Phavithra Manoharan: Institute for Sustainable Industries and Liveable Cities, Victoria University
Wei Hong: Institute for Sustainable Industries and Liveable Cities, Victoria University
Jiao Yin: Institute for Sustainable Industries and Liveable Cities, Victoria University
Hua Wang: Institute for Sustainable Industries and Liveable Cities, Victoria University
Yanchun Zhang: Institute for Sustainable Industries and Liveable Cities, Victoria University
Wenjie Ye: Institute for Sustainable Industries and Liveable Cities, Victoria University

DOI: https://doi.org/10.1007/s41019-024-00260-z
Journal volume & issue: Vol. 9, no. 4
pp. 393 – 408

Abstract

Read online

Abstract Insider threats pose a critical risk to organisations, impacting their data, processes, resources, and overall security. Such significant risks arise from individuals with authorised access and familiarity with internal systems, emphasising the potential for insider threats to compromise the integrity of organisations. Previous research has addressed the challenge by pinpointing malicious actions that have already occurred but provided limited assistance in preventing those risks. In this research, we introduce a novel approach based on bidirectional long short-term memory (BiLSTM) networks that effectively captures and analyses the patterns of individual actions and their sequential dependencies. The focus is on predicting whether an individual would be a malicious insider in a future day based on their daily behavioural records over the previous several days. We analyse the performance of the four supervised learning algorithms on manual features, sequential features, and the ground truth of the day with different combinations. In addition, we investigate the performance of different RNN models, such as RNN, LSTM, and BiLSTM, in incorporating these features. Moreover, we explore the performance of different predictive lengths on the ground truth of the day and different embedded lengths for the sequential features. All the experiments are conducted on the CERT r4.2 dataset. Experiment results show that BiLSTM has the highest performance in combining these features.

Published in Data Science and Engineering

ISSN: 2364-1185 (Print); 2364-1541 (Online)
Publisher: SpringerOpen
Country of publisher: Germany
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.springer.com/41019

About the journal

Abstract

Keywords