Pioneering a DeepSeek R1-Generated UML Dataset and an Automated Multimodal Visual Validation Framework
Tóm tắt
This paper presents a novel framework for the automated constrution and validation of a dataset tailored to UML (Unified Modeling Language) code generation, leveraging recent advancements in large language models (LLMs) and multimodal evaluation techniques. The proposed dual model architecture employs LLaMA 3.2 1B-Instruct to generate software feature descriptions from an end user perspective, followed by DeepSeek-R1-Distill-Qwen-32B to prduce corresponding UML use case diagrams along with reasoning traces. The resulting dataset comprises 3,000 samples, each containing a feature description paired with a UML diagram. To ensure quality and consistency, a multi-model visual verification system is introduced, incorporating three vision-language models to evaluate the alignment between textual inputs and generated diagrams. Each model assigns a score ranging from 1 to 6, and final scores …