Research on the Construction of Malware Variant Datasets and Their Detection Method

Faming Lu; Zhaoyang Cai; Zedong Lin; Yunxia Bao; Mengfan Tang

doi:10.3390/app12157546

Applied Sciences (Jul 2022)

Research on the Construction of Malware Variant Datasets and Their Detection Method

Faming Lu,
Zhaoyang Cai,
Zedong Lin,
Yunxia Bao,
Mengfan Tang

Affiliations

Faming Lu: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Zhaoyang Cai: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Zedong Lin: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Yunxia Bao: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Mengfan Tang: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

DOI: https://doi.org/10.3390/app12157546
Journal volume & issue: Vol. 12, no. 15
p. 7546

Abstract

Read online

Malware detection is of great significance for maintaining the security of information systems. Malware obfuscation techniques and malware variants are increasingly emerging, but their samples and API (application programming interface) sequences are difficult to obtain. This poses difficulties for the development of malware variant detection models. To address this issue in this paper, we first generated a malware variant dataset using the obfuscation technique based on the disassembly and decompilation of malware. Then, an API call dataset of these malware variants was constructed through sandboxing. Compared to similar work, the malware variants and their obfuscated API call sequences generated in this paper were all runnable. After that, taking a public API call sequence dataset of obfuscation-free malware as input, a BERT (bidirectional encoder representation from transformers) pretrained model for malware detection was constructed. To enhance the ability of this pretrained model to handle obfuscation and variants, in this paper, we used adversarial training to improve the robustness and generalization of the detection model under obfuscation. As the experimental results show, the proposed scheme can improve the classification performance of malware variants under obfuscation. The accuracy of the malware variant classification was close to that of the unobfuscated case.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords