Advanced Machine Learning Based Malware Detection Systems

Song-Kyoo Kim; Xiaomei Feng; Hussam Al Hamadi; Ernesto Damiani; Chan Yeob Yeun; Sivaprasad Nandyala

doi:10.1109/ACCESS.2024.3434629

IEEE Access (Jan 2024)

Advanced Machine Learning Based Malware Detection Systems

Song-Kyoo Kim,
Xiaomei Feng,
Hussam Al Hamadi,
Ernesto Damiani,
Chan Yeob Yeun,
Sivaprasad Nandyala

Affiliations

Song-Kyoo Kim: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Xiaomei Feng: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Hussam Al Hamadi: ORCiD; University of Dubai, Dubai, United Arab Emirates
Ernesto Damiani: ORCiD; EECS Department, Center for Cyber-Physical Systems, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Chan Yeob Yeun: ORCiD; EECS Department, Center for Cyber-Physical Systems, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Sivaprasad Nandyala: ORCiD; Secure Systems Research Center, Technology Innovation Institute, Abu Dhabi, United Arab Emirates

DOI: https://doi.org/10.1109/ACCESS.2024.3434629
Journal volume & issue: Vol. 12
pp. 115296 – 115305

Abstract

Read online

In the area of machine learning (ML) training data optimization through the construction of compact data, the focus of this paper is presented. The concept of compact data design, aimed at creating an optimized dataset that maximizes benefits without the need to manage a vast amount of complex data, is introduced. Improvements in the methods for optimizing ML training have been incorporated into the development of artificial intelligence (AI) systems. The introduction of understanding ML training datasets as a facet of Explainable AI (XAI), comprehensible to humans, has been made. Among the methods of XAI, the evaluation of input feature importance stands out as a way to enhance the accuracy of complex ML models. The innovative method of compact data design for optimizing ML training through dataset reduction is proposed. The performance of an ML-based malware detection system, along with its variant utilizing compact data, has been assessed, demonstrating the maintenance of 99% accuracy. By applying a 76% reduced input dataset, the speed of ML training with the novel compact data design could be maximized, suggesting that an ML system trained in this manner could achieve statistically equivalent accuracy with only 57% of the original data sample size.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords