Digital Communications and Networks (Aug 2022)
N-gram MalGAN: Evading machine learning detection via feature n-gram
Abstract
In recent years, many adversarial malware examples with different feature strategies, especially GAN and its variants, have been introduced to handle the security threats, e.g., evading the detection of machine learning detectors. However, these solutions still suffer from problems of complicated deployment or long running time. In this paper, we propose an n-gram MalGAN method to solve these problems. We borrow the idea of n-gram from the Natural Language Processing (NLP) area to expand feature sources for adversarial malware examples in MalGAN. Generally, the n-gram MalGAN obtains the feature vector directly from the hexadecimal bytecodes of the executable file. It can be implemented easily and conveniently with a simple program language (e.g., C++), with no need for any prior knowledge of the executable file or any professional feature extraction tools. These features are functionally independent and thus can be added to the non-functional area of the malicious program to maintain its original executability. In this way, the n-gram could make the adversarial attack easier and more convenient. Experimental results show that the evasion rate of the n-gram MalGAN is at least 88.58% to attack different machine learning algorithms under an appropriate group rate, growing to even 100% for the Random Forest algorithm.