Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder Models

Thi-Thu-Huong Le; Jaehyun Kim; Sangmyeong Lee; Howon Kim

doi:10.1109/ACCESS.2024.3482389

IEEE Access (Jan 2024)

Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder Models

Thi-Thu-Huong Le,
Jaehyun Kim,
Sangmyeong Lee,
Howon Kim

Affiliations

Thi-Thu-Huong Le: ORCiD; Blockchain Platform Research Center, Pusan National University, Busan, South Korea
Jaehyun Kim: ORCiD; School of Computer Science and Engineering, Pusan National University, Busan, South Korea
Sangmyeong Lee: ORCiD; School of Computer Science and Engineering, Pusan National University, Busan, South Korea
Howon Kim: ORCiD; School of Computer Science and Engineering, Pusan National University, Busan, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3482389
Journal volume & issue: Vol. 12
pp. 154700 – 154717

Abstract

Read online

The rapid expansion of blockchain technology, particularly Ethereum, has driven widespread adoption of smart contracts. However, the security of these contracts remains a critical concern due to the increasing frequency and complexity of vulnerabilities. This paper presents a comprehensive approach to detecting vulnerabilities in Ethereum smart contracts using pre-trained Large Language Models (LLMs). We apply transformer-based LLMs, leveraging their ability to understand and analyze Solidity code to identify potential security flaws. Our methodology involves fine-tuning eight distinct pre-trained LLM models on curated datasets varying in types and distributions of vulnerabilities, including multi-class vulnerabilities. The datasets-SB Curate, Benmark Solidity Smart Contract, and ScrawID-were selected to ensure a thorough evaluation of model performance across different vulnerability types. We employed over-sampling techniques to address class imbalances, resulting in more reliable training outcomes. We extensively evaluate these models using precision, recall, accuracy, F1 score, and Receiver Operating Characteristics (ROC) curve metrics. Our results demonstrate that the transformer encoder architecture, with its multi-head attention and feed-forward mechanisms, effectively captures the nuances of smart contract vulnerabilities. The models show promising potential in enhancing the security and reliability of Ethereum smart contracts, offering a robust solution to challenges posed by software vulnerabilities in the blockchain ecosystem.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords