Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning

Jian Feng; Lianyang Zou; Ou Ye; Jingzhou Han

doi:10.1109/ACCESS.2020.3043188

IEEE Access (Jan 2020)

Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning

Jian Feng,
Lianyang Zou,
Ou Ye,
Jingzhou Han

Affiliations

Jian Feng: ORCiD; College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, China
Lianyang Zou: Information Technology Department for Head Office of SPD Bank, Application Development Services Sub-center (Xi’an), National Institute of Standards and Technology, Xi’an, China
Ou Ye: ORCiD; College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, China
Jingzhou Han: College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2020.3043188
Journal volume & issue: Vol. 8
pp. 221214 – 221224

Abstract

Read online

Phishing is a kind of online attack that attempts to defraud sensitive information of network users. Current phishing webpage detection methods mainly use manual feature collection, and there are problems that feature extraction is complicated and the possible correlation between features cannot be avoided. To solve the problems, a new phishing webpage detection model is proposed, among which the main components are automatic learning representations from multi-aspects features through representation learning and extracting features by hybrid deep learning network. Firstly, the model treats URL, HTML page content, and DOM (Document Object Model) structure of webpages as character sequences respectively, and uses representation learning technology to automatically learn the representation of the webpages; then, sends multiple representations to a hybrid deep learning network composed of a convolutional neural network and a bidirectional long and short-term memory network through different channels to extract local and global features, and use the attention mechanism to strengthen the influence of important features; finally, the output of multiple channels is fused to realize classification prediction. Through four sets of experiments to verify the detection effect of the model, the results show that the overall classification effect of the model is better than the existing classic phishing webpage detection methods, the accuracy reaches 99.05%, and the false positive rate is only 0.25%. It is proved that the strategies of extracting webpage features from all aspects through representation learning and hybrid deep learning network can effectively improve the detection effect of phishing webpages.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords