International Journal of Data and Network Science (Jan 2024)

Customized K-nearest neighbors’ algorithm for malware detection

  • Mosleh M. Abualhaj,
  • Ahmad Adel Abu-Shareha,
  • Qusai Y. Shambour,
  • Adeeb Alsaaidah,
  • Sumaya N. Al-Khatib,
  • Mohammed Anbar

DOI
https://doi.org/10.5267/j.ijdns.2023.9.012
Journal volume & issue
Vol. 8, no. 1
pp. 431 – 438

Abstract

Read online

The security and integrity of computer systems and networks highly depend on malware detection. In the realm of malware detection, the K-Nearest Neighbors (KNN) algorithm is a well-liked and successful machine learning algorithm. However, the choice of an acceptable distance metric parameter has a significant impact on the KNN algorithm's performance. This study tries to improve malware detection by adjusting the KNN algorithm's distance metric parameter. The distance metric greatly influences the similarity or dissimilarity between instances in the feature space. The KNN algorithm for malware detection can be more accurate and effective by carefully choosing or modifying the distance metric. This paper analyzes multiple distance metrics, including Minkowski distance, Manhattan distance, and Euclidean distance. These metrics account for the traits of malware samples while capturing various aspects of similarity. The effectiveness of the KNN algorithm is evaluated using the MalMem-2022 malware dataset, and the results are broken down into these three-distance metrics. The experimental findings show that, among the three distance metric parameters, the Euclidean and Minkowski distance metric parameters considerably produced the best outcomes with binary classification. While with multiclass classification, the KNN algorithm has achieved the highest outcomes using Manhattan distance.