IEEE Access (Jan 2021)

Android Malware Detection Based on Composition Ratio of Permission Pairs

  • Hiroya Kato,
  • Takahiro Sasaki,
  • Iwao Sasase

DOI
https://doi.org/10.1109/ACCESS.2021.3113711
Journal volume & issue
Vol. 9
pp. 130006 – 130019

Abstract

Read online

Detecting Android malware is imperative. Among various detection schemes, permission pair based ones are promising for practical detection. However, conventional schemes cannot simultaneously meet requirements for practical use in terms of efficiency, intelligibility, and stability of detection performance. Although the latest scheme relies on differences of frequent pairs between benign apps and malware, it cannot meet the stability. This is because recent malware tends to require unnecessary permissions to imitate benign apps, which makes using the frequencies ineffective. To meet all the requirements, in this paper, we propose Android malware detection based on a Composition Ratio (CR) of permission pairs. We define the CR as a ratio of a permission pair to all pairs in an app. We focus on the fact that the CR tends to be small in malware because of unnecessary permissions. To obtain features without using the frequencies, we construct databases about the CR. For each app, we calculate similarity scores based on the databases. Finally, eight scores are fed into machine learning (ML) based classifiers as features. By doing this, stable performance can be achieved. Since our features are just eight-dimensional, the proposed scheme takes less training time and is compatible with other ML based schemes. Furthermore, our features can quantitatively offer clear information that helps human to understand detection results. Our scheme is suitable for practical use because all the requirements can be met. By using real datasets, our results show that our scheme can detect malware with up to 97.3% accuracy. Besides, compared with an existing scheme, our scheme can reduce the feature dimensions by about 99% with maintaining comparable accuracy on recent datasets.

Keywords