Detection of Data Scarce Malware Using One-Shot Learning With Relation Network

Faiza Babar Khan; Muhammad Hanif Durad; Asifullah Khan; Farrukh Aslam Khan; Sajjad Hussain Chauhdary; Mohammed Alqarni

doi:10.1109/ACCESS.2023.3293117

IEEE Access (Jan 2023)

Detection of Data Scarce Malware Using One-Shot Learning With Relation Network

Faiza Babar Khan,
Muhammad Hanif Durad,
Asifullah Khan,
Farrukh Aslam Khan,
Sajjad Hussain Chauhdary,
Mohammed Alqarni

Affiliations

Faiza Babar Khan: ORCiD; CIPMA Laboratory, DCIS, Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan
Muhammad Hanif Durad: ORCiD; CIPMA Laboratory, DCIS, Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan
Asifullah Khan: ORCiD; Pattern Recognition Laboratory, DCIS, PIEAS, Nilore, Islamabad, Pakistan
Farrukh Aslam Khan: ORCiD; Center of Excellence in Information Assurance (CoEIA), King Saud University, Riyadh, Saudi Arabia
Sajjad Hussain Chauhdary: ORCiD; Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
Mohammed Alqarni: ORCiD; Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2023.3293117
Journal volume & issue: Vol. 11
pp. 74438 – 74457

Abstract

Read online

Malware has evolved to pose a major threat to information security. Efficient anti-malware software is essential in safeguarding confidential information from these threats. However, identifying malware continues to be a challenging task. Signature-based detection methods are quick but fail to detect unknown malware. Additionally, the traditional machine learning archetype requires a large amount of data to be effective, which hinders the ability of an anti-malware system to quickly learn about new threats with limited training samples. In a real-world setting, the majority of malware is found in the form of Portable Executable (PE) files. While there are various formats of PE files, samples of all formats such as ocx, acm, com, scr, etc., are not readily available in large numbers. Therefore, building a conventional Machine Learning (ML) model with greater generalization for data-scarce PE formats becomes a hefty task. Consequently, in such a scenario, Few-Shot learning (FSL) is helpful in detecting the presence of malware, even with a very small number of training samples. FSL techniques help to make predictions based on an insufficient number of samples. In this paper, we propose a novel architecture based on the Relation Network for FSL implementation. We propose a Discriminative Feature Embedder for feature extraction. These extracted features are passed to our proposed Relation Module (RM) for similarity measure. RM produces the relation scores that lead to improved classification. We use PE file formats, i.e., ocx, acm, com, and scr, after transforming them into images. We employ five-shot learning and then one-shot learning, which produces 94% accuracy with only one training instance. We observe that the proposed architecture outpaces the baseline method and provides enhanced accuracy by up to 94% with only one sample.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords