PeerJ Computer Science (Feb 2022)

Detecting phishing webpages via homology analysis of webpage structure

  • Jian Feng,
  • Yuqiang Qiao,
  • Ou Ye,
  • Ying Zhang

DOI
https://doi.org/10.7717/peerj-cs.868
Journal volume & issue
Vol. 8
p. e868

Abstract

Read online Read online

Phishing webpages are often generated by phishing kits or evolved from existing kits. Therefore, the homology analysis of phishing webpages can help curb the proliferation of phishing webpages from the source. Based on the observation that phishing webpages belonging to the same family have similar page structures, a homology detection method based on webpage clustering according to structural similarity is proposed. The method consists of two stages. The first stage realizes model construction. Firstly, it extracts the structural features and style attributes of webpages through the document structure and vectorizes them, and then assigns different weights to different features, and measures the similarity of webpages and guides webpage clustering by webpage difference index. The second phase completes the detection of webpages to be tested. The fingerprint generation algorithm using double compressions generates fingerprints for the centres of the clusters and the webpages to be tested respectively and accelerates the detection process of the webpages to be tested through bitwise comparison. Experiments show that, compared with the existing methods, the proposed method can accurately locate the family of phishing webpages and can detect phishing webpages efficiently.

Keywords