Gong-kuang zidonghua (Aug 2022)
Multi-feature fusion based encrypted malicious traffic detection method for coal mine network
Abstract
The coal mine network is faced with the threat of malicious traffic encrypted by the transport layer security protocol (TLS) generated by malicious software and the high false alarm rate of encrypted traffic during detection. In order to solve the above problems, a multi-feature fusion malicious traffic detection method for coal mine network TLS encryption is proposed. The characteristics of multiple and heterogeneous malicious traffic features of TLS encryption are analyzed. The connection features, metadata and TLS encrypted protocol handshake features of coal mine network TLS encrypted malicious traffic in the transmission process are extracted. A coal mine network TLS encrypted traffic characteristic set is constructed by using a flow fingerprint method. The features in the feature set are standardized, one-hot encoded and normalized, so as to obtain an efficient sample set. Five sub-models of decision tree (DT), K-nearest neighbor (KNN), Gaussian Naive Bayes (GNB), L2 logistic regression (LR) and stochastic gradient descent (SGD) classifiers were used to test the above feature sets. In order to improve the robustness of the detection model, combined with the principle of the voting method, five classifier sub-models are combined to construct a muti-model voting classifier (MVC) detection model. Five classifier sub-models are used as voters. Each classifier sub-model trains the sample set separately, and votes according to the principle of minority obeying majority to get the final prediction value of each sample. The experimental results show that the proposed feature set reduces the dimension of the sample set and improves the detection efficiency of TLS encrypted traffic. DT classifier and KNN classifier perform best on the data set, reaching more than 99% accuracy. But they have the risk of overfitting. Although the LR classifier and SGD classifier sub-models have also achieved recognition accuracy of more than 90%, the false positive rate of these two sub-models is too high. The GNB classifier sub-model performs the worst, with an accuracy of 82%. But it has the advantage of low false-positive rate. The accuracy and recall rate of that MVC detection model on a data set is more than 99%, the false alarm rate is 0.13%. The detection rate of encrypted malicious traffic is improved, and the false alarm rate of encrypted traffic detection is 0. And the comprehensive performance of the MVC detection model is better than that of other classifier sub-models.
Keywords