Applied Sciences (Nov 2023)

SSCL-TransMD: Semi-Supervised Continual Learning Transformer for Malicious Software Detection

  • Liang Kou,
  • Donghui Zhao,
  • Hui Han,
  • Xiong Xu,
  • Shuaige Gong,
  • Liandong Wang

DOI
https://doi.org/10.3390/app132212255
Journal volume & issue
Vol. 13, no. 22
p. 12255

Abstract

Read online

Machine learning-based malware (malicious software) detection methods have a wide range of real-world applications. However, these types of approaches suffer from the fatal problem of “model aging”, in which the validity of the model decreases rapidly as the malware continues to evolve and variants emerge continuously. The model aging problem is usually solved by model retraining, which relies on lots of labeled samples obtained at great expense. To address this challenge, this paper proposes a semi-supervised continuous learning malware detection model based on Transformer. Firstly, this model improves the lifelong semi-supervised mixture algorithm to dynamically adjust the weighted combination of new sample sequences and historical ones to solve the imbalance problem. Secondly, the Learning with Local and Global Consistency algorithm is used to iteratively compute similarity scores for the unlabeled samples in the mixed samples to obtain pseudo-labels. Lastly, the Multilayer Perceptron is applied for malware classification. To validate the effectiveness of the model, this paper conducts experiments on the CICMalDroid2020 dataset. The experimental results show that the proposed model performs better than existing deep learning detection models. The F1 score has an average improvement of 1.27% compared to other models when conducting binary classification. And, after inputting hybrid samples, including historical data and new data, four times, the F1 score is still 1.96% higher than other models.

Keywords