IEEE Access (Jan 2024)
Zero and Few Short Learning Using Large Language Models for De-Identification of Medical Records
Abstract
The paper aims to evaluate and provide a comparative analysis of the performance and fine-tuning cost of various Large Language Models (LLMs) such as GPT-3.5, GPT-4, PaLM, Bard, and Llama in automating the de-identification of Protected Health Information (PHI) from medical records, ensuring patient and healthcare professional privacy. Zero-shot learning was utilized initially to assess the capabilities of these LLMs in de-identifying medical data. Subsequently, each model was fine-tuned with varying training set sizes to observe changes in performance. The study also investigates the impact of the specificity of prompts on the accuracy of de-identification tasks. Fine-tuning LLMs with specific examples significantly enhanced the accuracy of the de-identification process, surpassing the zero-shot learning accuracy of pre-trained counterparts. Notably, a fine-tuned GPT-3.5 model with a few-shot learning technique was able to exceed the performance of a zero-shot learning GPT-4 model, with 99% accuracy. Detailed prompts resulted in higher task accuracy across all models, yet fine-tuned models with brief instructions still outperformed pre-trained models given detailed prompts. Also, the fine-tuned models were more resilient to medical record format change than the zero-shot models. Code, calculations, and comparisons are available at https://github.com/YashwanthYS/De-Identification-of-medical-Records. The findings underscore the potential of LLMs, particularly when fine-tuned, to effectively automate the de-identification of PHI in medical records. The study highlights the importance of model training and prompt specificity in achieving high accuracy in de-identification tasks.
Keywords