Computers (Aug 2024)
Achieving High Accuracy in Android Malware Detection through Genetic Programming Symbolic Classifier
Abstract
The detection of Android malware is of paramount importance for safeguarding users’ personal and financial data from theft and misuse. It plays a critical role in ensuring the security and privacy of sensitive information on mobile devices, thereby preventing unauthorized access and potential damage. Moreover, effective malware detection is essential for maintaining device performance and reliability by mitigating the risks posed by malicious software. This paper introduces a novel approach to Android malware detection, leveraging a publicly available dataset in conjunction with a Genetic Programming Symbolic Classifier (GPSC). The primary objective is to generate symbolic expressions (SEs) that can accurately identify malware with high precision. To address the challenge of imbalanced class distribution within the dataset, various oversampling techniques are employed. Optimal hyperparameter configurations for GPSC are determined through a random hyperparameter values search (RHVS) method developed in this research. The GPSC model is trained using a 10-fold cross-validation (10FCV) technique, producing a set of 10 SEs for each dataset variation. Subsequently, the most effective SEs are integrated into a threshold-based voting ensemble (TBVE) system, which is then evaluated on the original dataset. The proposed methodology achieves a maximum accuracy of 0.956, thereby demonstrating its effectiveness for Android malware detection.
Keywords