CLEI Electronic Journal (May 2022)

Short-time prediction of DNS queries using deep learning and pre-trained word embedding

  • Merlino Jorge,
  • Pablo Rodríguez-Bocca

DOI
https://doi.org/10.19153/cleiej.25.2.6
Journal volume & issue
Vol. 25, no. 2

Abstract

Read online

Word embeddings are used in natural language processing to group semantically similar words. In this paper, we create word embeddings for Internet Domain Names (DNS) from corpora of anonymized DNS queries from an Internet Service Provider. We use each embedding as a layer of a recurrent neural network (RNN) that works as a Language Model for the DNS queries generated by the users. We use these RNNs to predict the next DNS query in two different cases. A first case tries to predict the next domain query from the DNS server’s point of view so the corpus is close to the original log data. A second case tries to predict the next domain queried by a user from the user’s point of view. Here the corpus has larger preprocessing. We show that this procedure has good accuracy for the DNS server-side problem, but low accuracy for the user-side problem. Moreover, we show that training the same RNN without using the pre-trained embedding takes more time and is substantially less accu- rate. These results have practical applications for the service’s latency reduction, cache optimization in recursive DNS servers, automatic filtering of inappropriate domains, and detecting anomalies.

Keywords