IEEE Access (Jan 2024)
Android Malware Detection Based on Informative Syscall Subsequences
Abstract
The Android operating system commands a dominant market share of over 70% in the smartphone industry. However, this widespread usage has resulted in a concerning increase in malware applications. While existing static malware detection mechanisms are vulnerable to code obfuscation attacks, manipulating the runtime system call (syscall) sequence remains a significant challenge for attackers. Consequently, syscall-based malware detection mechanisms are gaining prominence. Current syscall-based malware detection approaches rely on machine learning algorithms, utilizing numerical features such as syscall frequencies and transition probability matrices. However, the wide range of values in these features necessitates large datasets for effective classifier training, and susceptibility to noise and outliers persists. As a result, there is an urgent need for a binary representation of dynamic features to improve malware detection efficiency. To address this challenge, our paper proposes an innovative syscall subsequence-based binary feature representation method for machine learning-driven malware detection. By employing the information gain method, we identify informative syscall subsequences. The proposed mechanism achieves an impressive 99% accuracy in detecting malware applications using just 50% of the training data, across both the Drebin/AMD and CICMalDroid2020 datasets.
Keywords