A3CM: Automatic Capability Annotation for Android Malware

Junyang Qiu; Jun Zhang; Wei Luo; Lei Pan; Surya Nepal; Yu Wang; Yang Xiang

doi:10.1109/ACCESS.2019.2946392

IEEE Access (Jan 2019)

A3CM: Automatic Capability Annotation for Android Malware

Junyang Qiu,
Jun Zhang,
Wei Luo,
Lei Pan,
Surya Nepal,
Yu Wang,
Yang Xiang

Affiliations

Junyang Qiu: ORCiD; School of Information Technology, Deakin University, Geelong, VIC, Australia
Jun Zhang: School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, VIC, Australia
Wei Luo: School of Information Technology, Deakin University, Geelong, VIC, Australia
Lei Pan: School of Information Technology, Deakin University, Geelong, VIC, Australia
Surya Nepal: CSIRO, Data61, Sydney, NSW, Australia
Yu Wang: ORCiD; School of Computer Science, Guangzhou University, Guangzhou, China
Yang Xiang: School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, VIC, Australia

DOI: https://doi.org/10.1109/ACCESS.2019.2946392
Journal volume & issue: Vol. 7
pp. 147156 – 147168

Abstract

Read online

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords