IEEE Access (Jan 2022)
Data Discretization and Decision Boundary Data Point Analysis for Unknown Attack Detection
Abstract
Researchers have continuously sought effective ways to detect unknown (zero-day) cyberattacks in real time. Most current methods rely on pattern-recognition to identify known threats when they appear. Recently, machine learning anomaly detection tools that train a model on normal network data have been used to identify outliers representing unknown attacks. However, detecting unknown attacks is difficult because of a lack of information on unknown attacks, class imbalance in the data, or failure to accurately detect attacks with normal patterns. To overcome these problems, this study applied data discretization and decision-boundary data point analyses to scrutinize patterns near the thresholds of uncertainty. A novel discretization method was used to effectively train a model for the fuzzy c-means feature analysis of data points at the decision boundary, through which adversarial features were detected and classified based on their entropy. Consequently, it was possible to identify incorrectly detected attack data distributed near the model’s decision boundary. The NSL-KDD dataset, which is commonly used to evaluate ML intrusion detection systems, was used to evaluate the proposed method. The results showed that our model successfully identified attacks at the decision boundary and that its performance can be improved through classification. In addition, after classification, it was confirmed that the accuracy of detecting DoS attacks improved by 5 to 7%, Probe by 7 to 10%, R2L by 4 to 7%, and U2R by 1 to 9%, compared with that of existing models.
Keywords