IEEE Access (Jan 2018)

Dalvik Opcode Graph Based Android Malware Variants Detection Using Global Topology Features

  • Jixin Zhang,
  • Zheng Qin,
  • Kehuan Zhang,
  • Hui Yin,
  • Jingfu Zou

DOI
https://doi.org/10.1109/ACCESS.2018.2870534
Journal volume & issue
Vol. 6
pp. 51964 – 51974

Abstract

Read online

Since Android has become the dominator of smartphone operating system market with a share of 86.8%, the number of Android malicious applications are increasing rapidly as well. Such a large volume of diversified malware variants has forced researchers to investigate new methods by using machine learning since it provides a powerful ability for variants detection. Since the static analysis of malware plays an important role in system security and the opcode has been shown as an effective representation of malware, some of them use the Dalvik opcodes as features of malware and adopt machine learning to detect Android malware. However, current opcode-based methods are also facing some problems, such as considering both of accuracy and time cost, selection of features, and the lack of understanding or description of the characteristics of malware. To overcome the existing challenges, we propose a novel method to build a graph of Dalvik opcode and analyze its global topology properties, which will first construct a weighted probability graph of operations, and then we use information entropy to prune this graph while retaining information as more as possible, the next we extract several global topology features of the graph to represent malware, finally search the similarities with these features between programs. These global topology features formulate the high-level characteristics of malware. Our approach provides a light weight framework to detect Android malware variants based on graph theory and information theory. Theoretical analysis and real-life experimental results show the effectiveness, efficiency, and robustness of our approach, which achieves high detection accuracy and cost little training and detection time.

Keywords