Detecting Stealthy Domain Generation Algorithms Using Heterogeneous Deep Neural Network Framework

Luhui Yang; Guangjie Liu; Yuewei Dai; Jinwei Wang; Jiangtao Zhai

doi:10.1109/ACCESS.2020.2988877

IEEE Access (Jan 2020)

Detecting Stealthy Domain Generation Algorithms Using Heterogeneous Deep Neural Network Framework

Luhui Yang,
Guangjie Liu,
Yuewei Dai,
Jinwei Wang,
Jiangtao Zhai

Affiliations

Luhui Yang: ORCiD; School of Automation, Nanjing University of Science and Technology, Nanjing, China
Guangjie Liu: ORCiD; School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
Yuewei Dai: ORCiD; School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
Jinwei Wang: Department of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China
Jiangtao Zhai: ORCiD; School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China

DOI: https://doi.org/10.1109/ACCESS.2020.2988877
Journal volume & issue: Vol. 8
pp. 82876 – 82889

Abstract

Read online

Distinguishing malicious domain names generated by various domain generation algorithms (DGA) is critical for defending a network against sophisticated network attacks. In recent years, stealthy domain generation algorithms (SDGA) have been proposed and revealed significantly stronger stealthiness comparing to the traditional character-based DGA. Existing state-of-the-art detection schemes are not effective enough for detecting SDGA. In this paper, we exploit the character-level characteristics of the SDGA domain names and propose a heterogeneous deep neural network framework (HDNN) for detecting SDGA. HDNN employs a proposed improved parallel CNN (IPCNN) architecture with multi-sizes of convolution kernel for extracting multi-scale local features from a domain name. The framework also contains a proposed self-attention based bidirectional long short term memory (SA-Bi-LSTM) architecture which can extract the bidirectional global features with attention mechanism from a domain name. Besides that, the focal loss function is introduced to mitigate the imbalance of the sample quantity in the training phase. The benchmark experiments are carried out based on the database composed of the collected benign domain names, real-world DGA and SDGA ones. Compared to the 6 influential deep-learning-based DGA detection schemes, the proposed scheme has achieved state-of-the-art detection results on SDGAs, and also achieved state-of-the-art results on binary and multiclass classification for traditional DGAs.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords