N-gram MalGAN: Evading machine learning detection via feature n-gram

Enmin Zhu; Jianjie Zhang; Jijie Yan; Kongyang Chen; Chongzhi Gao

Digital Communications and Networks (Aug 2022)

N-gram MalGAN: Evading machine learning detection via feature n-gram

Enmin Zhu,
Jianjie Zhang,
Jijie Yan,
Kongyang Chen,
Chongzhi Gao

Affiliations

Enmin Zhu: School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Jianjie Zhang: School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Jijie Yan: School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China
Kongyang Chen: Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, 510006, China; Pazhou Lab, Guangzhou, 510330, China; Corresponding author.
Chongzhi Gao: School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, 510006, China

Journal volume & issue: Vol. 8, no. 4
pp. 485 – 491

Abstract

Read online

In recent years, many adversarial malware examples with different feature strategies, especially GAN and its variants, have been introduced to handle the security threats, e.g., evading the detection of machine learning detectors. However, these solutions still suffer from problems of complicated deployment or long running time. In this paper, we propose an n-gram MalGAN method to solve these problems. We borrow the idea of n-gram from the Natural Language Processing (NLP) area to expand feature sources for adversarial malware examples in MalGAN. Generally, the n-gram MalGAN obtains the feature vector directly from the hexadecimal bytecodes of the executable file. It can be implemented easily and conveniently with a simple program language (e.g., C++), with no need for any prior knowledge of the executable file or any professional feature extraction tools. These features are functionally independent and thus can be added to the non-functional area of the malicious program to maintain its original executability. In this way, the n-gram could make the adversarial attack easier and more convenient. Experimental results show that the evasion rate of the n-gram MalGAN is at least 88.58% to attack different machine learning algorithms under an appropriate group rate, growing to even 100% for the Random Forest algorithm.

Published in Digital Communications and Networks

ISSN: 2352-8648 (Online)
Publisher: KeAi Communications Co., Ltd.
Country of publisher: China
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.keaipublishing.com/en/journals/digital-communications-and-networks/

About the journal

Abstract

Keywords