Sensors (Jun 2024)

Ethereum Phishing Scam Detection Based on Data Augmentation Method and Hybrid Graph Neural Network Model

  • Zhen Chen,
  • Sheng-Zheng Liu,
  • Jia Huang,
  • Yu-Han Xiu,
  • Hao Zhang,
  • Hai-Xia Long

DOI
https://doi.org/10.3390/s24124022
Journal volume & issue
Vol. 24, no. 12
p. 4022

Abstract

Read online

The rapid advancement of blockchain technology has fueled the prosperity of the cryptocurrency market. Unfortunately, it has also facilitated certain criminal activities, particularly the increasing issue of phishing scams on blockchain platforms such as Ethereum. Consequently, developing an efficient phishing detection system is critical for ensuring the security and reliability of cryptocurrency transactions. However, existing methods have shortcomings in dealing with sample imbalance and effective feature extraction. To address these issues, this study proposes an Ethereum phishing scam detection method based on DA-HGNN (Data Augmentation Method and Hybrid Graph Neural Network Model), validated by real Ethereum datasets to prove its effectiveness. Initially, basic node features consisting of 11 attributes were designed. This study applied a sliding window sampling method based on node transactions for data augmentation. Since phishing nodes often initiate numerous transactions, the augmented samples tended to balance. Subsequently, the Temporal Features Extraction Module employed Conv1D (One-Dimensional Convolutional neural network) and GRU-MHA (GRU-Multi-Head Attention) models to uncover intrinsic relationships between features from the time sequences and to mine adequate local features, culminating in the extraction of temporal features. The GAE (Graph Autoencoder) concept was then leveraged, with SAGEConv (Graph SAGE Convolution) as the encoder. In the SAGEConv reconstruction module, by reconstructing the relationships between transaction graph nodes, the structural features of the nodes were learned, obtaining reconstructed node embedding representations. Ultimately, phishing fraud nodes were further identified by integrating temporal features, basic features, and embedding representations. A real Ethereum dataset was collected for evaluation, and the DA-HGNN model achieved an AUC-ROC (Area Under the Receiver Operating Characteristic Curve) of 0.994, a Recall of 0.995, and an F1-score of 0.994, outperforming existing methods and baseline models.

Keywords