IEEE Access (Jan 2019)

A3CM: Automatic Capability Annotation for Android Malware

  • Junyang Qiu,
  • Jun Zhang,
  • Wei Luo,
  • Lei Pan,
  • Surya Nepal,
  • Yu Wang,
  • Yang Xiang

DOI
https://doi.org/10.1109/ACCESS.2019.2946392
Journal volume & issue
Vol. 7
pp. 147156 – 147168

Abstract

Read online

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.

Keywords