Cybersecurity (Oct 2024)

MVD-HG: multigranularity smart contract vulnerability detection method based on heterogeneous graphs

  • Jingjie Xu,
  • Ting Wang,
  • Mingqi Lv,
  • Tieming Chen,
  • Tiantian Zhu,
  • Baiyang Ji

DOI
https://doi.org/10.1186/s42400-024-00245-5
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Smart contracts have significant losses due to various types of vulnerabilities. However, traditional vulnerability detection methods rely extensively on expert rules, resulting in low detection accuracy and poor adaptability to novel attacks. To address these problems, in this paper, deep learning methods are combined with smart contract vulnerability code detection approaches. Abstract syntax trees (ASTs), which are special isomorphic graph structures, are an important bridge between source code and graph neural networks. By learning the AST, the model can understand the semantics of the source code. Moreover, graph neural networks have an increasing ability to address complex heterogeneous graphs. Therefore, control flow graphs are fused with data flow graphs on the basis of the ASTs to build heterogeneous graphs with richer code semantics. Furthermore, multigranularity analysis of the vulnerability detection results is performed, including coarse-grained contract-level vulnerability detection and fine-grained line-level vulnerability detection. Through this multigranularity detection approach, vulnerabilities in contracts can be identified and analysed more comprehensively, providing a richer perspective and more solutions for vulnerability detection. The experimental results show that the proposed multigranularity vulnerability detection method based on heterogeneous graphs (MVD-HG) improves both the accuracy and range of the detected vulnerability types in contract-level vulnerability detection tasks; moreover, in the line-level vulnerability detection task, the MVD-HG model achieves significant results and addresses the shortcomings of existing methods. In addition, based on code generation methods used in related fields, a data enhancement method based on the source code is developed, which effectively expands the experimental dataset to address the reduced credibility of the results due to insufficient amounts of data.

Keywords