Fine-Tuning and Evaluating Distilled DeepSeek Models for Code Reasoning
Tóm tắt
Source code reasoning is a critical aspect of software engineering, demanding language models capable of understanding, generating, and reasoning about complex program logic. This paper evaluates the effectiveness of distilled large language models (LLMs) fine-tuned for reasoning over code. We focus on three models distilled from DeepSeek-R1: two based on the Qwen architecture (with 1.5 B and 7B parameters) and one based on LLaMA (8B parameters). All models were fine-tuned on the CodeAlpaca-DeepSeek-32B-Reasoning dataset, which emphasizes logical coherence in code generation and comprehension tasks. Performance was assessed using the ROUGE-L metric. The results demonstrate that the DeepSeek-Distill-Qwen-1.5B model achieved the highest ROUGE-L score at 43%, followed by the Qwen-7B model at 40%. In contrast, the LLaMA-8B model lagged significantly behind with only 10 …