Applied Sciences (Aug 2024)

Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems

  • Jacek Krupski,
  • Marcin Iwanowski,
  • Waldemar Graniszewski

DOI
https://doi.org/10.3390/app14166995
Journal volume & issue
Vol. 14, no. 16
p. 6995

Abstract

Read online

Network traffic classification models, an essential part of intrusion detection systems, need to be as simple as possible due to the high speed of network transmission. One of the fastest approaches is based on decision trees, where the classification process requires a series of tests, resulting in a class assignment. In the network traffic classification process, these tests are performed on extracted traffic features. The classification computational efficiency grows when the number of features and their tests in the decision tree decreases. This paper investigates the relationship between the number of features used to construct the decision-tree-based intrusion detection model and the classification quality. This work deals with a reference dataset that includes IoT/IIoT network traffic. A feature selection process based on the aggregated rank of features computed as the weighted average of rankings obtained using multiple (in this case, six) classifier-based feature selectors is proposed. It results in a ranking of 32 features sorted by importance and usefulness in the classification process. In the outcome of this part of the study, it turns out that acceptable classification results for the smallest number of best features are achieved for the eight most important features at −95.3% accuracy. In the second part of these experiments, the dependence of the classification speed and accuracy on the number of most important features taken from this ranking is analyzed. In this investigation, optimal times are also obtained for eight or fewer number of the most important features, e.g., the trained decision tree needs 0.95 s to classify nearly 7.6 million samples containing eight network traffic features. The conducted experiments prove that a subset of just a few carefully selected features is sufficient to obtain reasonably high classification accuracy and computational efficiency.

Keywords