IEEE Access (Jan 2018)

DroidEnsemble: Detecting Android Malicious Applications With Ensemble of String and Structural Static Features

  • Wei Wang,
  • Zhenzhen Gao,
  • Meichen Zhao,
  • Yidong Li,
  • Jiqiang Liu,
  • Xiangliang Zhang

DOI
https://doi.org/10.1109/ACCESS.2018.2835654
Journal volume & issue
Vol. 6
pp. 31798 – 31807

Abstract

Read online

Android platform has dominated the operating system of mobile devices. However, the dramatic increase of Android malicious applications (malapps) has caused serious software failures to Android system and posed a great threat to users. The effective detection of Android malapps has thus become an emerging yet crucial issue. Characterizing the behaviors of Android applications (apps) is essential to detecting malapps. Most existing works on detecting Android malapps were mainly based on string static features, such as permissions and API usage extracted from apps. There also exists work on the detection of Android malapps with structural features, such as control flow graph and data flow graph. As Android malapps have become increasingly polymorphic and sophisticated, using only one type of static features may result in false negatives. In this paper, we propose DroidEnsemble that takes advantages of both string features and structural features to systematically and comprehensively characterize the static behaviors of Android apps and thus build a more accurate detection model for the detection of Android malapps. We extract each app’s string features, including permissions, hardware features, filter intents, restricted API calls, used permissions, code patterns, as well as structural features like function call graph. We then use three machine learning algorithms, namely, support vector machine, k-nearest neighbor, and random forest, to evaluate the performance of these two types of features and of their ensemble. In the experiments, we evaluate our methods and models with 1386 benign apps and 1296 malapps. Extensive experimental results demonstrate the effectiveness of DroidEnsemble. It achieves the detection accuracy as 95.8% with only string features and as 90.68% with only structural features. DroidEnsemble reaches the detection accuracy as 98.4% with the ensemble of both types of features, reducing 9 false positives and 12 false negatives compared to the results with only string features.

Keywords