An ensemble approach for imbalanced multiclass malware classification using 1D-CNN

Binayak Panda; Sudhanshu Shekhar Bisoyi; Sidhanta Panigrahy

doi:10.7717/peerj-cs.1677

PeerJ Computer Science (Nov 2023)

An ensemble approach for imbalanced multiclass malware classification using 1D-CNN

Binayak Panda,
Sudhanshu Shekhar Bisoyi,
Sidhanta Panigrahy

Affiliations

Binayak Panda: Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha ’O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India
Sudhanshu Shekhar Bisoyi: Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India
Sidhanta Panigrahy: Haas School of Business, University of California, Berkeley, Berkeley, CA, United States of America

DOI: https://doi.org/10.7717/peerj-cs.1677
Journal volume & issue: Vol. 9
p. e1677

Abstract

Read online Read online

Dependence on the internet and computer programs demonstrates the significance of computer programs in our day-to-day lives. Such demands motivate malware developers to create more malware, both in terms of quantity and variety. Researchers are constantly faced with hurdles while attempting to protect themselves from potential hazards and risks due to malware authors’ usage of code obfuscation techniques. Metamorphic and polymorphic variations are easily able to elude the widely utilized signature-based detection procedures. Researchers are more interested in deep learning approaches than machine learning techniques to analyze the behavior of such a vast number of virus variants. Researchers have been drawn to the categorization of malware within itself in addition to the classification of malware against benign programs to examine the behavioral differences between them. In order to investigate the relationship between the application programming interface (API) calls throughout API sequences and classify them, this work uses the one-dimensional convolutional neural network (1D-CNN) model to solve a multiclass classification problem. On API sequences, feature vectors for distinctive APIs are created using the Word2Vec word embedding approach and the skip-gram model. The one-vs.-rest approach is used to train 1D-CNN models to categorize malware, and all of them are then combined with a suggested ModifiedSoftVoting algorithm to improve classification. On the open benchmark dataset Mal-API-2019, the suggested ensembled 1D-CNN architecture captures improved evaluation scores with an accuracy of 0.90, a weighted average F1-score of 0.90, and an AUC score of more than 0.96 for all classes of malware.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords