Static analysis framework for permission-based dataset generation and android malware detection using machine learning

Amarjyoti Pathak; Th. Shanta Kumar; Utpal Barman

doi:10.1186/s13635-024-00182-3

EURASIP Journal on Information Security (Oct 2024)

Static analysis framework for permission-based dataset generation and android malware detection using machine learning

Amarjyoti Pathak,
Th. Shanta Kumar,
Utpal Barman

Affiliations

Amarjyoti Pathak: GIMT, Guwahati under Assam Science and Technology University
Th. Shanta Kumar: Department of CSE, Girijananda Chowdhury University
Utpal Barman: Faculty of Computer Technology, Assam down town University

DOI: https://doi.org/10.1186/s13635-024-00182-3
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Since Android is the popular mobile operating system worldwide, malicious attackers seek out Android smartphones as targets. The Android malware can be identified through a number of established detection techniques. However, the issues presented by modern malware cannot be met by traditional signature or heuristic-based malware detection methods. Previous research suggests that machine-learning classifiers can be utilised to analyse permissions, making it possible to differentiate between malicious and benign applications on the Android platform. There exist machine-learning methods that utilise permission-based attributes to build models for the detection of malware on Android devices. Nevertheless, the performance of these detection methods is dependent on the raw or feature datasets. Android malware research frequently faces a major obstacle due to the lack of adequate and up-to-date raw malware datasets. In this paper, we put forward a systematic approach to generate an Android permission-based dataset using static analysis. To create the dataset, we collect recent raw malware samples (APK files) and focus on the reverse engineering approach and permission-based features extraction. We also conduct a thorough feature analysis to determine the important Android permissions and present a machine-learning-based Android malware detection mechanism. The experimental result of our study demonstrates that with just 48 features, the random forest classifier-based Android malware detection model obtains the best accuracy of 97.5%.

Published in EURASIP Journal on Information Security

ISSN: 2510-523X (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jis.eurasipjournals.com/

About the journal

Abstract

Keywords