IEEE Access (Jan 2020)

WS-LSMR: Malicious WebShell Detection Algorithm Based on Ensemble Learning

  • Zhuang Ai,
  • Nurbol Luktarhan,
  • Yuxin Zhao,
  • Chaofei Tang

DOI
https://doi.org/10.1109/ACCESS.2020.2989304
Journal volume & issue
Vol. 8
pp. 75785 – 75797

Abstract

Read online

To solve the problem that the features produced by hidden means, such as code obfuscation and compression, in encrypted malicious WebShell files are not the same as those produced by non-encrypted files, a WebShell attack detection algorithm based on ensemble learning is proposed. First, this algorithm extracted the feature vocabulary of the unigrams and 4-grams based on opcode; subsequently, the 4-gram feature word weights were obtained according to the calculated Gini coefficient of the unigram feature words and used to select the features, which will be selected again based on the Gini coefficient of the 4-gram feature words. Consequently, a feature vocabulary that can detect encrypted and unencrypted WebShell files was constructed. Second, in order to improve the adaptability and accuracy of the detection method, an ensemble detection model called WS-LSMR, consisting of a Logistic Regression, Support Vector Machine, Multi-layer Perceptron and Random Forest, was constructed. The model uses a weighted voting method to determine the WebShell classification. This experiment demonstrated that compared with the traditional single WebShell detection algorithm, the recall rate and accuracy rate improved to 99.14% and 94.28%, respectively, which proves that this method has better detection performance.

Keywords