Don&#x2019;t Stop Believin&#x2019;: A Unified Evaluation Approach for LLM Honeypots

Simon B. Weber; Marc Feger; Michael Pilgermann

doi:10.1109/ACCESS.2024.3472460

IEEE Access (Jan 2024)

Don’t Stop Believin’: A Unified Evaluation Approach for LLM Honeypots

Simon B. Weber,
Marc Feger,
Michael Pilgermann

Affiliations

Simon B. Weber: ORCiD; Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
Marc Feger: Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
Michael Pilgermann: ORCiD; Department of Computer Science and Media, Brandenburg University of Applied Sciences, Brandenburg an der Havel, Germany

DOI: https://doi.org/10.1109/ACCESS.2024.3472460
Journal volume & issue: Vol. 12
pp. 144579 – 144587

Abstract

Read online

The research area of honeypots is gaining new momentum, driven by advancements in large language models (LLMs). The chat-based applications of generative pretrained transformer (GPT) models seem ideal for the use as honeypot backends, especially in request-response protocols like Secure Shell (SSH). By leveraging LLMs, many challenges associated with traditional honeypots – such as high development costs, ease of exposure, and breakout risks – appear to be solved. While early studies have primarily focused on the potential of these models, our research investigates the current limitations of GPT-3.5 by analyzing three datasets of varying complexity. We conducted an expert annotation of over 1,400 request-response pairs, encompassing 230 different base commands. Our findings reveal that while GPT-3.5 struggles to maintain context, incorporating session context into response generation improves the quality of SSH responses. Additionally, we explored whether distinguishing between convincing and non-convincing responses is a metrics issue. We propose a paraphrase-mining approach to address this challenge, which achieved a macro F1 score of 77.85% using cosine distance in our evaluation. This method has the potential to reduce annotation efforts, converge LLM-based honeypot performance evaluation, and facilitate comparisons between new and previous approaches in future research.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords