Symmetry (Oct 2022)
MULBER: Effective Android Malware Clustering Using Evolutionary Feature Selection and Mahalanobis Distance Metric
Abstract
Symmetric and asymmetric patterns are fascinating phenomena that show a level of co-existence in mobile application behavior analyses. For example, static phenomena, such as information sharing through collaboration with known apps, is a good example of a symmetric model of communication, and app collusion, where apps collaborate dynamically with unknown malware apps, is an example of a serious threat with an asymmetric pattern. The symmetric nature of app collaboration can become vulnerable when a vulnerability called PendingIntent is exchanged during Inter-Component Communication (ICC). The PendingIntent (PI) vulnerability enables a flexible software model, where the PendingIntent creator app can temporarily share its own permissions and identity with the PendingIntent receiving app. The PendingIntent vulnerability does not require approval from the device user or Android OS to share the permissions and identity with other apps. This is called a PI leak, which can lead to malware attacks such as privilege escalation and component hijacking attacks. This vulnerability in the symmetric behavior of an application without validating an app’s privileges dynamically leads to the asymmetric phenomena that can damage the robustness of an entire system. In this paper, we propose MULBER, a lightweight machine learning method for the detection of Android malware communications that enables a cybersecurity system to analyze multiple patterns and learn from them to help prevent similar attacks and respond to changing behavior. MULBER can help cybersecurity teams to be more proactive in preventing dynamic PI-based communication threats and responding to active attacks in real time. MULBER performs a static binary analysis on the APK file and gathers approximately 10,755 features, reducing it to 42 key features by grouping the permissions under the above-mentioned four categories. Finally, MULBER learns from these multivariate features using evolutionary feature selection and the Mahalanobis distance metric and classifies them as either benign or malware apps. In an evaluation of 22,638 malware samples from recent Android APK malware databases such as Drebin and CICMalDroid-2020, MULBER outperformed others by clustering applications based on the Mahalanobis distance metric and detected 95.69% of malware with few false alarms and the explanations provided for each detection revealed the relevant properties of the detected malware.
Keywords