IEEE Access (Jan 2020)

File Entropy Signal Analysis Combined With Wavelet Decomposition for Malware Classification

  • Hui Guo,
  • Shuguang Huang,
  • Cheng Huang,
  • Zulie Pan,
  • Min Zhang,
  • Fan Shi

DOI
https://doi.org/10.1109/ACCESS.2020.3020330
Journal volume & issue
Vol. 8
pp. 158961 – 158971

Abstract

Read online

With the rapid development of the Internet, malware variants have increased exponentially, which poses a key threat to cyber security. Persistent efforts have been made to classify malware variants, but there are still many challenges, including the incapacity to deal with various malware variants belonging to similar families, the problem of time and resource consuming, etc. This paper proposes a novel method, called Malware Entropy Sequences Reflect the Family (MESRF), to improve the classification of malware based on the entropy sequences features. In prior research, entropy demonstrated good performance in many areas. First, the global features of the signals were extracted from the entropy sequences by some statistical methods. Next, some local features (i.e. structural entropy features) are extracted based on the discrete wavelet decomposition algorithm and vectorized by the Bag-of-words model, endowing it the high accuracy of malware classification. To evaluate our method, we conducted numerous experiments on the malware datasets with more than 20,000 samples. Through experiments, MESRF showed superiority comparing with other malware classification models, and the accuracy and ROC of the method even could reach 99.83% and 99.98% respectively on the malimg dataset.

Keywords