Law, Technology and Humans (Nov 2024)
Case Law as Data : Prompt Engineering Strategies for Case Outcome Extraction with Large Language Models in a Zero-Shot Setting
Abstract
This study explores the effectiveness of prompt optimization techniques for legal case outcome extraction using Large Language Models (LLMs). Two state-of-the-art LLMs, LLaMA3 70b and Mixtral 8x7b, are used in a zero-shot data extraction task on a diverse dataset of 400 French appellate court decisions. The results show that LLMs exhibit remarkable efficiency in extraction tasks. Our findings indicate that baseline prompts achieve high performance metrics, with a best F1 score of 0.980 and a worst F1 score of 0.853. Optimized prompts yield varying degrees of improvement, with a best F1 score of 0.994 and a worst F1 score of 0.912. While some optimized prompts demonstrate significant improvements, others exhibit minor or even negative changes. Our results suggest that the optimization process has a non-uniform impact on performance metrics, and the effectiveness of optimized prompts depends on the specific model and dataset being used. These results underscore the significance of prompt engineering in optimizing LLM performance for Legal Information Extraction and Litigation Analytics reliability.
Keywords