Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia; Faculty of Information Technology, Department of Computer Systems, Czech Technical University in Prague, Czechia; Corresponding author at: Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia.
Tereza Čapková
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Michal Konopa
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Ladislav Beránek
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Jan Fiala
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Michal Houda
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Petr Chládek
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Jana Klicnarová
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Radim Remeš
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Marek Šulista
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Klára Vocetková
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
Marie Feslová
Faculty of Economics, Department of Applied Mathematics and Informatics, University of South Bohemia in České Budějovice, Czechia
In this paper, we would like to introduce a unique dataset that covers thousands of network flow measurements realized through TCP in a data center environment. The TCP protocol is widely used for reliable data transfers and has many different versions. The various versions of TCP are specific in how they deal with link congestion through the congestion control algorithm (CCA). Our dataset represents a unique, comprehensive comparison of the 17 currently used versions of TCP with different CCAs. Each TCP flow was measured precisely 50 times to eliminate the measurement instability. The comparison of the various TCP versions is based on the knowledge of 18 quantitative attributes representing the parameters of a TCP transmission. Our dataset is suitable for testing and comparing different versions of TCP, creating new CCAs based on machine learning models, or creating and testing machine learning models, allowing the identification and optimization of the currently existing versions of TCP.