A Closer Look at Machine Learning Effectiveness in Android Malware Detection

Filippos Giannakas; Vasileios Kouliaridis; Georgios Kambourakis

doi:10.3390/info14010002

Information (Dec 2022)

A Closer Look at Machine Learning Effectiveness in Android Malware Detection

Filippos Giannakas,
Vasileios Kouliaridis,
Georgios Kambourakis

Affiliations

Filippos Giannakas: Department of Information and Communication Engineering, University of the Aegean, 83200 Karlovasi, Samos, Greece
Vasileios Kouliaridis: Department of Information and Communication Engineering, University of the Aegean, 83200 Karlovasi, Samos, Greece
Georgios Kambourakis: Department of Information and Communication Engineering, University of the Aegean, 83200 Karlovasi, Samos, Greece

DOI: https://doi.org/10.3390/info14010002
Journal volume & issue: Vol. 14, no. 1
p. 2

Abstract

Read online

Nowadays, with the increasing usage of Android devices in daily life activities, malware has been increasing rapidly, putting peoples’ security and privacy at risk. To mitigate this threat, several researchers have proposed different methods to detect Android malware. Recently, machine learning based models have been explored by a significant mass of researchers checking for Android malware. However, selecting the most appropriate model is not straightforward, since there are several aspects that must be considered. Contributing to this domain, the current paper explores Android malware detection from diverse perspectives; this is achieved by optimizing and evaluating various machine learning algorithms. Specifically, we conducted an experiment for training, optimizing, and evaluating 27 machine learning algorithms, and a Deep Neural Network (DNN). During the optimization phase, we performed hyperparameter analysis using the Optuna framework. The evaluation phase includes the measurement of different performance metrics against a contemporary, rich dataset, to conclude with the most accurate model. The best model was further interpreted by conducting feature analysis, using the Shapley Additive Explanations (SHAP) framework. Our experiment results showed that the best model is the DNN consisting of four layers (two hidden), using the Adamax optimizer, as well as the Binary Cross-Entropy (loss), and the Softsign activation functions. The model succeeded with 86% prediction accuracy, while the balanced accuracy, the F1-score, and the ROC-AUC metrics were at 82%.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords