Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models

Fouad Trad; Ali Chehab

doi:10.3390/make6010018

Machine Learning and Knowledge Extraction (Feb 2024)

Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models

Fouad Trad,
Ali Chehab

Affiliations

Fouad Trad: Electrical and Computer Engineering, American University of Beirut, Beirut 1107-2020, Lebanon
Ali Chehab: Electrical and Computer Engineering, American University of Beirut, Beirut 1107-2020, Lebanon

DOI: https://doi.org/10.3390/make6010018
Journal volume & issue: Vol. 6, no. 1
pp. 367 – 384

Abstract

Read online

Large Language Models (LLMs) are reshaping the landscape of Machine Learning (ML) application development. The emergence of versatile LLMs capable of undertaking a wide array of tasks has reduced the necessity for intensive human involvement in training and maintaining ML models. Despite these advancements, a pivotal question emerges: can these generalized models negate the need for task-specific models? This study addresses this question by comparing the effectiveness of LLMs in detecting phishing URLs when utilized with prompt-engineering techniques versus when fine-tuned. Notably, we explore multiple prompt-engineering strategies for phishing URL detection and apply them to two chat models, GPT-3.5-turbo and Claude 2. In this context, the maximum result achieved was an F1-score of 92.74% by using a test set of 1000 samples. Following this, we fine-tune a range of base LLMs, including GPT-2, Bloom, Baby LLaMA, and DistilGPT-2—all primarily developed for text generation—exclusively for phishing URL detection. The fine-tuning approach culminated in a peak performance, achieving an F1-score of 97.29% and an AUC of 99.56% on the same test set, thereby outperforming existing state-of-the-art methods. These results highlight that while LLMs harnessed through prompt engineering can expedite application development processes, achieving a decent performance, they are not as effective as dedicated, task-specific LLMs.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords