IEEE Access (Jan 2024)
Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA
Abstract
Large language models (LLMs) have shown progress and promise in diverse applications ranging from the medical field to chat bots. Developing LLMs requires a large corpus of data and significant computation resources to achieve efficient learning. Foundation models (in particular LLMs) serve as the basis for fine-tuning on a new corpus of data. Since the original foundation models contain a very large number of parameters, fine-tuning them can be quite challenging. Development of the low-rank adaption technique (LoRA) for fine-tuning, and the quantized version of LoRA, also known as QLoRA, allows for fine-tuning of LLMs on a new smaller corpus of data. This paper focuses on the repeatability of fine-tuning four LLMs using QLoRA. We have fine-tuned them for seven trials each under the same hardware and software settings. We also validated our study for the repeatability (stability) issue by fine-tuning LLMs on two public datasets. For each trial, each LLM was fine-tuned on a subset of the dataset and tested on a holdout test set. Fine-tuning and inference were done on a single GPU. Our study shows that fine-tuning of LLMs with the QLoRA method is not repeatable (not stable), such that different fine-tuned runs result in different performance on the holdout test set.
Keywords