IEEE Access (Jan 2024)
Detecting Domain Names Generated by DGAs With Low False Positives in Chinese Domain Names
Abstract
Attackers are known to utilize domain generation algorithms (DGAs) to generate domain names for command and control (C&C) servers and facilitate the distribution of uniform resource locators within malicious software. DGAs pose a significant threat to cybersecurity owing to their ability to dynamically generate unpredictable domain names. Extensive research is currently underway to detect the domain names created using DGAs. However, the high false positive rates when handling benign domain names in non-English languages pose a challenge. Thus, this study proposes a DGA detection method that effectively embeds non-English domain names to focus on Chinese domain names, which are referred to as domain names composed of Pinyin. The proposed method segments domain names into meaningful subwords for effective vector representation. Consequently, the FastText model learns the context information of the segmented subwords and embeds the domain name. Further, the deep learning-based detection model learns the vectorized domain names and determines whether a particular domain name is DGA-generated. We labeled the Chinese domain names among the benign domain names for our experiment. The experimental results show that the proposed method outperforms existing methods across all performance metrics on the entire test dataset. Notably, the proposed method minimizes the false positive rate, thereby enhancing detection reliability. In addition, it exhibits high performance, achieving a recall of 0.9873 and 0.9886 for Chinese and English domain names, respectively. This demonstrates that the proposed method consistently delivers high performance across various metrics and languages.
Keywords