Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms

Czangyeob Kim; Myeongjun Jang; Seungwan Seo; Kyeongchan Park; Pilsung Kang

doi:10.1109/ACCESS.2021.3071763

IEEE Access (Jan 2021)

Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms

Czangyeob Kim,
Myeongjun Jang,
Seungwan Seo,
Kyeongchan Park,
Pilsung Kang

Affiliations

Czangyeob Kim: ORCiD; School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
Myeongjun Jang: ORCiD; Department of Computer Science, University of Oxford, Oxford, U.K
Seungwan Seo: ORCiD; School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
Kyeongchan Park: School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
Pilsung Kang: ORCiD; School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3071763
Journal volume & issue: Vol. 9
pp. 58088 – 58101

Abstract

Read online

Previous methods for system intrusion detection have mainly consisted of those based on pattern matching that employs prior knowledge extracted from experts’ domain knowledge. However, pattern matching-based methods have a major drawback that it can be bypassed through various modified techniques. These advanced persistent threats cause limitation to the pattern matching-based detecting mechanism, because they are not only more sophisticated than usual threats but also specialized in the targeted attacking object. The defense mechanism should have to comprehend unusual phenomenons or behaviors to successfully handles the advanced threats. To achieve this, various security techniques based on machine learning have been developed recently. Among these, anomaly detection algorithms, which are trained in unsupervised fashion, are capable of reducing efforts of security experts and securing labeled dataset through post analysis. It is further possible to distinguish abnormal behaviors more precisely by training classification models if sufficient amounts of labeled dataset is obtained through post analysis of anomaly detection results. In this study, we proposed an end-to-end abnormal behavior detection method based on sequential information preserving log embedding algorithms and machine learning-based anomaly detection algorithms. Contrary to other machine learning based system anomaly detection models, which borrow domain experts’ knowledge to extract significant features from the log data, raw log data are transformed into a fixed size of continuous vector regardless of their length, and these vectors are used to train the anomaly detection models. Experimental results based on a real system call trace dataset, our proposed log embedding method with unsupervised anomaly detection model yielded a favorable performance, at most 0.8708 in terms of AUROC, and it can be further improved up to 0.9745 with supervised classification algorithms if sufficient labeled attack log data become available.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords