IEEE Access (Jan 2023)

TI-16 DNS Labeled Dataset for Detecting Botnets

  • Manmeet Singh,
  • Maninder Singh,
  • Sanmeet Kaur

DOI
https://doi.org/10.1109/ACCESS.2023.3287141
Journal volume & issue
Vol. 11
pp. 62616 – 62629

Abstract

Read online

Botnets continue to evolve despite many efforts by law enforcement agencies and security researchers. As a result, there is an increase in the number of cybercrimes. This has led to a greater research focus on botnet detection. Among the reasons for growth in botnet and cybercrimes despite greater research focus are that significant number of the proposed techniques are not reproducible (unavailability of source code), do not contain a detailed description for effective comparison, and the absence of a real world labeled dataset for effective comparison. There is a grave problem of the unavailability of the labeled real-world dataset for bot infection detection. This paper aims to create a public labeled real-world Domain Name System (DNS) dataset for bot infection detection. The dataset contains real world DNS traffic of benign and malicious hosts. The dataset containing 24 features is labeled to list infected Domain Generation Algorithms (DGA) hosts along with the botnet family name and the DGA domains used for C&C communication. A total of 7644 hosts were found infected with nine different botnets namely modpack, virut, necurs, conficker, ud3, suppobox, nymain, tofsee and pitou. Finally, a machine learning classifier is developed to distinguish DGA bots from normal hosts using these features with an accuracy of 99.59%.

Keywords