Obfuscated malware detection using deep neural network with ANOVA feature selection on CIC-MalMem-2022 dataset

Mourad Hadjila; Mohammed Merzoug; Wafaa Ferhi; Djillali Moussaoui; Al Baraa Bouidaine; Mohammed Hicham Hachemi

doi:10.17586/2226-1494-2024-24-5-849-857

Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Oct 2024)

Obfuscated malware detection using deep neural network with ANOVA feature selection on CIC-MalMem-2022 dataset

Mourad Hadjila,
Mohammed Merzoug,
Wafaa Ferhi,
Djillali Moussaoui,
Al Baraa Bouidaine,
Mohammed Hicham Hachemi

Affiliations

Mourad Hadjila: ORCiD; D.Sc., Lecturer-Researcher, University of Tlemcen, Tlemcen, 13000, Algeria, sc 56440246000
Mohammed Merzoug: ORCiD; D.Sc., Lecturer-Researcher, University of Tlemcen, University of Tlemcen, Tlemcen, 13000, Algeria, sc 55309175500
Wafaa Ferhi: ORCiD; PhD Student, University of Tlemcen, Tlemcen, 13000, Algeria
Djillali Moussaoui: ORCiD; D.Sc., Lecturer-Researcher, University of Tlemcen, Tlemcen, 13000, Algeria, sc 56360232600
Al Baraa Bouidaine: ORCiD; PhD Student, University of Tlemcen, Tlemcen, 13000, Algeria
Mohammed Hicham Hachemi: ORCiD; D.Sc., Lecturer-Researcher, University of Oran, Oran, 31000, Algeria, sc 57196009731

DOI: https://doi.org/10.17586/2226-1494-2024-24-5-849-857
Journal volume & issue: Vol. 24, no. 5
pp. 849 – 857

Abstract

Read online

Malware analysis is the process of dissecting malicious software to understand its functionality, behavior, and potential risks. Artificial Intelligence (AI) and deep learning are ushering in a new era of automated, intelligent, and adaptive malware analysis. This convergence of AI and deep learning promises to revolutionize the way cybersecurity professionals detect, analyze and respond to malware threats. This paper proposed a Deep Neural Network (DNN) model built from features selected by ANalysis Of Variance (ANOVA) F-test (DNN-ANOVA) to increase accuracy by identifying informative features. ANOVA is a feature selection method used for numerical input data when the target variable is categorical. The top k most relevant features are those whose score values are greater than a certain threshold equal to the ratio between the sum of all features scores and the total number of features. Experiments are conducted on CIC-MalMem-2022 dataset. Malware Analysis is performed using binary classification to detect the presence or absence of malware and multiclass classification to detect not only the malware but also its type. According to the test results, DNN-ANOVA model achieves best values of 100 %, 99.99 %, 99.99 %, and 99.98 % in terms of precision, accuracy, F1-score and recall respectively for binary classification. In addition, DNN-ANOVA outperforms the current works with an overall accuracy rate of 85.83 %, and 73.98 % for family attacks and individual attacks respectively in the case of multiclass classification.

Published in Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki

ISSN: 2226-1494 (Print); 2500-0373 (Online)
Publisher: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
Country of publisher: Russian Federation
LCC subjects: Science: Physics: Optics. Light; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ntv.ifmo.ru/en/english.htm

About the journal

Abstract

Keywords