Information (Jul 2024)

Extended Isolation Forest for Intrusion Detection in Zeek Data

  • Fariha Moomtaheen,
  • Sikha S. Bagui,
  • Subhash C. Bagui,
  • Dustin Mink

DOI
https://doi.org/10.3390/info15070404
Journal volume & issue
Vol. 15, no. 7
p. 404

Abstract

Read online

The novelty of this paper is in determining and using hyperparameters to improve the Extended Isolation Forest (EIF) algorithm, a relatively new algorithm, to detect malicious activities in network traffic. The EIF algorithm is a variation of the Isolation Forest algorithm, known for its efficacy in detecting anomalies in high-dimensional data. Our research assesses the performance of the EIF model on a newly created dataset composed of Zeek Connection Logs, UWF-ZeekDataFall22. To handle the enormous volume of data involved in this research, the Hadoop Distributed File System (HDFS) is employed for efficient and fault-tolerant storage, and the Apache Spark framework, a powerful open-source Big Data analytics platform, is utilized for machine learning (ML) tasks. The best results for the EIF algorithm came from the 0-extension level. We received an accuracy of 82.3% for the Resource Development tactic, 82.21% for the Reconnaissance tactic, and 78.3% for the Discovery tactic.

Keywords