Future Internet (Oct 2022)
A Self-Supervised Learning Model for Unknown Internet Traffic Identification Based on Surge Period
Abstract
The identification of Internet protocols provides a significant basis for keeping Internet security and improving Internet Quality of Service (QoS). However, the overwhelming developments and updating of Internet technologies and protocols have led to large volumes of unknown Internet traffic, which threaten the safety of the network environment a lot. Since most of the unknown Internet traffic does not have any labels, it is difficult to adopt deep learning directly. Additionally, the feature accuracy and identification model also impact the identification accuracy a lot. In this paper, we propose a surge period-based feature extraction method that helps remove the negative influence of background traffic in network sessions and acquire as many traffic flow features as possible. In addition, we also establish an identification model of unknown Internet traffic based on JigClu, the self-supervised learning approach to training unlabeled datasets. It finally combines with the clustering method and realizes the further identification of unknown Internet traffic. The model has been demonstrated with an accuracy of no less than 74% in identifying unknown Internet traffic with the public dataset ISCXVPN2016 under different scenarios. The work provides a novel solution for unknown Internet traffic identification, which is the most difficult task in identifying Internet traffic. We believe it is a great leap in Internet traffic identification and is of great significance to maintaining the security of the network environment.
Keywords