A Payload Based Malicious HTTP Traffic Detection Method Using Transfer Semi-Supervised Learning

Tieming Chen; Yunpeng Chen; Mingqi Lv; Gongxun He; Tiantian Zhu; Ting Wang; Zhengqiu Weng

doi:10.3390/app11167188

Applied Sciences (Aug 2021)

A Payload Based Malicious HTTP Traffic Detection Method Using Transfer Semi-Supervised Learning

Tieming Chen,
Yunpeng Chen,
Mingqi Lv,
Gongxun He,
Tiantian Zhu,
Ting Wang,
Zhengqiu Weng

Affiliations

Tieming Chen: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Yunpeng Chen: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Mingqi Lv: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Gongxun He: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Tiantian Zhu: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Ting Wang: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
Zhengqiu Weng: College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China

DOI: https://doi.org/10.3390/app11167188
Journal volume & issue: Vol. 11, no. 16
p. 7188

Abstract

Read online

Malicious HTTP traffic detection plays an important role in web application security. Most existing work applies machine learning and deep learning techniques to build the malicious HTTP traffic detection model. However, they still suffer from the problems of huge training data collection cost and low cross-dataset generalization ability. Aiming at these problems, this paper proposes DeepPTSD, a deep learning method for payload based malicious HTTP traffic detection. First, it treats the malicious HTTP traffic detection as a text classification problem and trains the initial detection model using TextCNN on a public dataset, and then adapts the initial detection model to the target dataset based on a transfer learning algorithm. Second, in the transfer learning procedure, it uses a semi-supervised learning algorithm to accomplish the model adaptation task. The semi-supervised learning algorithm enhances the target dataset based on a HTTP payload data augmentation mechanism to exploit both the labeled and unlabeled data. We evaluate DeepPTSD on two real HTTP traffic datasets. The results show that DeepPTSD has competitive performance under the small data condition.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords