IEEE Access (Jan 2023)

SHAP Interpretations of Tree and Neural Network DNS Classifiers for Analyzing DGA Family Characteristics

  • Nikos Kostopoulos,
  • Dimitris Kalogeras,
  • Dimitris Pantazatos,
  • Maria Grammatikou,
  • Vasilis Maglaris

DOI
https://doi.org/10.1109/ACCESS.2023.3286313
Journal volume & issue
Vol. 11
pp. 61144 – 61160

Abstract

Read online

Domain Generation Algorithms (DGA’s) have been employed by botnet orchestrators for controlling infected hosts (bots), while evading detection by performing multiple DNS requests, mostly for non-existing domain names. With blacklists ineffective, modern DGA filtering methods rely on Machine Learning (ML). Emerging needs for higher intrusion detection accuracy lead to complex, non-interpretable black-box classifiers, thus requiring eXplainable Artificial Intelligence (XAI) techniques. In this paper, we utilize SHapley Additive exPlanation (SHAP) to derive model-agnostic, post-hoc interpretations on DGA name classifiers. This method is applied to binary supervised tree-based classifiers (e.g. eXtreme Gradient Boosting - XGBoost) and deep neural networks (Multi-Layer Perceptron - MLP) to assess domain name feature importance. SHAP visualization tools (summary, dependence, force plots) are used to rank features, investigate their effect on model decisions and determine their interactions. Specific interpretations are detailed for identifying names belonging to common DGA families pertaining to arithmetic, wordlist, hash and permutation based schemes. Learning and interpretations are based on up-to-date datasets, such as Tranco for benign and DGArchive for malicious names. Domain name features are extracted from dataset instances, thus limiting time-consuming and privacy-invasive database operations on historical data. Our experimental results demonstrate that SHAP enables explanations of XGBoost (the most accurate tree-based model) and MLP classifiers and indicates the characteristics of specific DGA schemes, commonly employed in attacks. In conclusion, we envision that XAI methods will expedite ML deployment in networking environments where justifications for black-box models are required.

Keywords