Scientific Data (May 2025)
A Real Network Environment Dataset for Traffic Analysis
Abstract
Abstract The objective of internet traffic analysis is to identify latent patterns and ascertain the true state of internet operations by examining traffic data. This approach is considered an effective and valuable means to achieve accurate network management. Whilst the extant network traffic datasets are predominantly collated within a laboratory environment, exhibiting deficiencies with regard to authenticity in terms of network scales, users, behaviours, and temporal and spatial characteristics, this paper proposes an in-situ network deployment and data collection scheme involving a large number of devices and users. The scheme involves the collection of a large real Internet traffic dataset including encrypted and non-encrypted traffic through sensors deployed on real-world network access equipment. Through desensitization, cleaning, feature engineering and labelling, an open database is created for researchers in the field of traffic analysis to use in academic and engineering.