Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification

Hyun-Il Kim; Moonyoung Kang; Seong-Je Cho; Sang-Il Choi

doi:10.1109/ACCESS.2021.3139334

IEEE Access (Jan 2022)

Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification

Hyun-Il Kim,
Moonyoung Kang,
Seong-Je Cho,
Sang-Il Choi

Affiliations

Hyun-Il Kim: Department of Computer Science and Engineering, Dankook University, Jukjeon-ro, Sugi-gu, Yongin-si, Gyeonggi-do, Republic of Korea
Moonyoung Kang: Department of Software Science, Dankook University, Jukjeon-ro, Sugi-gu, Yongin-si, Gyeonggi-do, Republic of Korea
Seong-Je Cho: ORCiD; Department of Computer Science and Engineering, Dankook University, Jukjeon-ro, Sugi-gu, Yongin-si, Gyeonggi-do, Republic of Korea
Sang-Il Choi: ORCiD; Department of Computer Science and Engineering, Dankook University, Jukjeon-ro, Sugi-gu, Yongin-si, Gyeonggi-do, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3139334
Journal volume & issue: Vol. 10
pp. 5518 – 5532

Abstract

Read online

It is important to effectively detect, mitigate, and defend against Android malware attacks, because Android malware has long represented a major threat to Android app security. Characterizing and classifying similar malicious apps into groups plays a particularly crucial role in building a secure Android app ecosystem. The classification of malware families can efficiently enhance the malware detection process and systematically elucidate malware patterns. In this paper, we propose a novel efficient deep learning network with multi-streams for Android malware family classification. We first obtain the input data for a convolutional neural network (CNN) in string format from some main files or sections contained in each Android malicious app. We then classify malware families by applying a 1-dimensional convolution filter-based network for the files or sections. Further, by using gradient analysis to visualize the important files and sections in malicious apps, we attempt to intuitively grasp which files or sections are the most significant for malware family classification. To validate the effectiveness of our approach, we conduct extensive experiments with the well-known DREBIN and AMD malware datasets, and we compare our approach with existing methods. Our experimental results show that the 1D CNN model is more accurate than the 2D CNN model, and that the code_item part in the classes.dex is the most relevant feature for malware classification, as it is more relevant than other parts such as AndroidManifest.xml and certificate. The proposed method achieves the best accuracy of 93.2% by using 1D convolution filters with multi-streams for the main files and sections of the malware samples.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords