Fine-Tuning Mini Language Models for Legal Multiple-Choice Question Answering: A Comparative Study of Phi-3.5, Qwen 2.5 and Llama 3.2
Tóm tắt
In this study, we explore the mini language models applications in legal domain, specifically Phi-3.5 Mini, Qwen 2.5 3B and Llama 3.2 3B, for legal multiple-choice question answering. We fine-tuned these models on CaseHOLD dataset to adapt them to the structural and semantic nuances of legal language and reasoning. The results show that fine-tuning improves performance of these models significantly with Phi-3. 5 Mini achieved a Micro F1 score of 76.93%, exceeding previous bests for the field of miniaturised models. Also, Qwen 2.5 3B and Llama 3.2 3B scored similarly competitive scores of 74.27% and 75.40%, respectively, reinforcing their viability as resource-efficient options compared to larger models. Mini language models offer competitive performance with specialize models like Legal-BERT, Caselaw-BERT, while operating on a lower computational resources and ability of natural language …