Data in Brief (Jun 2024)
A user DNS fingerprint dataset
Abstract
Using a user DNS fingerprint allows one to identify a specific network user regardless of the knowledge of his IP address. This method is proper, for example, when examining the behavior of a monitored network user in more depth. In contrast to other studies, this work introduces a dataset for possible user identification based only on the knowledge of its DNS fingerprint created from the previously sent DNS queries.We created a large dataset from the real network traffic of a metropolitan Internet service provider. The dataset was created from 2.3 billion DNS queries representing 6.2 million different domain names. The data collection took place over three months from 12/2023 to 02/2024.The dataset contains a detailed user activity description in the sense of overall daily activity statistics and detailed 24 h activity statistics. Each dataset record contains a list of 1137 classification attributes. The absolutely unique feature of this data set is the classification of user activity based on categories of content accessed by a user.The new dataset can be used for the creation of machine learning models, allowing the identification of a specific user without direct knowledge of their IP addresses or additional network location information. The dataset can also serve as a reference dataset for the creation of DNS fingerprints of users.