Features Engineering to Differentiate between Malware and Legitimate Software

Ammar Yahya Daeef; Ali Al-Naji; Ali K. Nahar; Javaan Chahl

doi:10.3390/app13031972

Applied Sciences (Feb 2023)

Features Engineering to Differentiate between Malware and Legitimate Software

Ammar Yahya Daeef,
Ali Al-Naji,
Ali K. Nahar,
Javaan Chahl

Affiliations

Ammar Yahya Daeef: Technical Institute for Administration, Middle Technical University, Baghdad 10074, Iraq
Ali Al-Naji: Electrical Engineering Technical College, Middle Technical University, Baghdad 10022, Iraq
Ali K. Nahar: Electrical Engineering Department, University of Technology, Baghdad 10066, Iraq
Javaan Chahl: School of Engineering, University of South Australia, Mawson Lakes, SA 5095, Australia

DOI: https://doi.org/10.3390/app13031972
Journal volume & issue: Vol. 13, no. 3
p. 1972

Abstract

Read online

Malware is the primary attack vector against the modern enterprise. Therefore, it is crucial for businesses to exclude malware from their computer systems. The most responsive solution to this issue would operate in real time at the edge of the IT system using artificial intelligence. However, a lightweight solution is crucial at the edge because these options are restricted by the lack of available memory and processing power. The best contender to offer such a solution is application programming interface (API) calls. However, creating API call characteristics that offer a high malware detection rate with quick execution is a significant challenge. This work uses visualisation analysis and Jaccard similarity to uncover the hidden patterns produced by different API calls in order to accomplish this goal. This study also compared neural networks which use long sequences of API calls with shallow machine learning classifiers. Three classifiers are used: support vector machine (SVM), k-nearest neighbourhood (KNN), and random forest (RF). The benchmark data set comprises 43,876 examples of API call sequences, divided into two categories: malware and legitimate. The results showed that RF performed similarly to long short-term memory (LSTM) and deep graph convolutional neural networks (DGCNNs). They also suggest the potential for performing inference on edge devices in a real-time setting.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords