Scientific Data (Oct 2024)

CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines

  • Karel Hynek,
  • Jan Luxemburk,
  • Jaroslav Pešek,
  • Tomáš Čejka,
  • Pavel Šiška

DOI
https://doi.org/10.1038/s41597-024-03927-4
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.