Fine-Tuning Tiny Language Models for Legal Question Answering: A Comparative Study of Gemma 2, Qwen 2.5 and Llama 3.2
Tóm tắt
The domain of legal question answering presents significant challenges for natural language processing due to the complexity and nuance of legal texts. While large language models have shown promise, the efficacy of the latest generation of “tiny” language models remains underexplored. This paper presents a comparative study of three prominent open-weight TLMs - Google’s Gemma 2 2B, Alibaba’s Qwen 2.5 1.5B, and Meta’s Llama 3.2 1B - fine-tuned for legal QA on the CaseHOLD dataset. Employing parameter-efficient fine-tuning (LoRA, QLoRA) on a limited subset of 10,000 training examples to simulate resource-constrained conditions, we evaluate their performance. Our results demonstrate that fine-tuning yields substantial improvements over base models, with Gemma 2 2B achieving a top Micro F1 score of 74.47, followed by Qwen 2.5 at 72.17 and Llama 3.2 at 71.17. Notably, Gemma 2’s performance …