IEEE Access (Jan 2019)

A Novel Solutions for Malicious Code Detection and Family Clustering Based on Machine Learning

  • Hangfeng Yang,
  • Shudong Li,
  • Xiaobo Wu,
  • Hui Lu,
  • Weihong Han

DOI
https://doi.org/10.1109/ACCESS.2019.2946482
Journal volume & issue
Vol. 7
pp. 148853 – 148860

Abstract

Read online

Malware has become a major threat to cyberspace security, not only because of the increasing complexity of malware itself, but also because of the continuously created and produced malicious code. In this paper, we propose two novel methods to solve the malware identification problem. One is to solve to malware classification. Different from traditional machine learning, our method introduces the ensemble models to solve the malware classification problem. The other is to solve malware family clustering. Different from the classic malware family clustering algorithm, our method introduces the t-SNE algorithm to visualize the feature data and then determines the number of malware families. The two proposed novel methods have been extensively tested on a large number of real-world malware samples. The results show that the first one is far superior to the existed individual models and the second one has a good adaptation ability. Our methods can be used for malicious code classification and family clustering, also with higher accuracy.

Keywords