IEEE Access (Jan 2021)
Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection
Abstract
With more than three million applications already in the Android marketplace, various malware detection systems based on machine learning have been proposed to prevent attacks from cybercriminals; most of these systems use static analyses to extract application features. However, many features generated by static analyses can be easily thwarted by obfuscation techniques. Therefore, several researchers have addressed this obfuscation problem with obfuscation-invariant features. However, to the best of our knowledge, no researcher has utilized deobfuscation techniques. To this end, we adopt a code deobfuscation technique with an Android malware detection system and investigate its effects. Experimental results indicate that code deobfuscation can successfully retrieve useful information concealed by obfuscation. Further, we propose interaction terms based on identified feature interactions. The proposed interaction terms aim to eliminate the interference caused by the size of the application and other features because many feature values are correlated to the size of the application. In addition, the experimental results indicate that these interaction terms have a high ranking in terms of feature importance values. Our proposed Android malware detection model achieves 99.55% accuracy and a 94.61% F1-score with the well-known Drebin dataset, which is better than the performance of previous works.
Keywords