Malware recognition approach based on self‐similarity and an improved clustering algorithm

Jinfu Chen; Chi Zhang; Saihua Cai; Zufa Zhang; Lu Liu; Longxia Huang

doi:10.1049/sfw2.12067

IET Software (Oct 2022)

Malware recognition approach based on self‐similarity and an improved clustering algorithm

Jinfu Chen,
Chi Zhang,
Saihua Cai,
Zufa Zhang,
Lu Liu,
Longxia Huang

Affiliations

Jinfu Chen: School of Computer Science and Communication Engineering Jiangsu University Zhenjiang China
Chi Zhang: School of Computer Science and Communication Engineering Jiangsu University Zhenjiang China
Saihua Cai: School of Computer Science and Communication Engineering Jiangsu University Zhenjiang China
Zufa Zhang: School of Computer Science and Communication Engineering Jiangsu University Zhenjiang China
Lu Liu: School of Computing and Mathematical Sciences University of Leicester Leicester UK
Longxia Huang: School of Computer Science and Communication Engineering Jiangsu University Zhenjiang China

DOI: https://doi.org/10.1049/sfw2.12067
Journal volume & issue: Vol. 16, no. 5
pp. 527 – 541

Abstract

Read online

Abstract The recognition of malware in network traffic is an important research problem. However, existing solutions addressing this problem rely heavily on the source code and misrecognise vulnerabilities (i.e. incur a high false positive rate (FPR)) in some cases. In this paper, we initially use the K‐means clustering algorithm to extract malware patterns under user to root attacks in network traffic. Since the traditional K‐means algorithm needs to determine the number of clusters in advance and it is easily affected by the initial cluster centres, we propose an improved K‐means clustering algorithm (NIKClustering algorithm) for cluster analysis. Furthermore, we propose the use of self‐similarity and our improved clustering algorithm to recognise buffer overflow vulnerabilities for malware in network traffic. This motivates us to design and implement a recognition approach for buffer overflow vulnerabilities based on self‐similarity and our improved clustering algorithm, called Reliable Self‐Similarity with Improved K‐means Clustering (RSS‐IKClustering). Extensive experiments conducted on two different datasets demonstrate that the RSS‐IKClustering can achieve much fewer false positives than other notable approaches while increasing accuracy. We further apply our RSS‐IKClustering approach on a public dataset (Center for Applied Internet Data Analysis), which also exhibited a high accuracy and low FPR of 96% and 1.5%, respectively.

Published in IET Software

ISSN: 1751-8806 (Print); 1751-8814 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/ietsfw

About the journal