Scopus

Pioneering a DeepSeek R1-Generated UML Dataset and an Automated Multimodal Visual Validation Framework

Tạp chí / Hội thảo: International Conference on Advances in Information and Communication Technology (2025) Vol 1, 346-355 Đơn vị: CNTT DOI / Link:

Tác giả

Van-Viet Nguyen ; Huu-Khanh Nguyen ; Kim-Son Nguyen ; Thi Minh-Hue Luong ; The-Vinh Nguyen ; Huu-Cong Nguyen ; Duc-Quang Vu

Tác giả liên hệ

Tóm tắt

This paper presents a novel framework for the automated constrution and validation of a dataset tailored to UML (Unified Modeling Language) code generation, leveraging recent advancements in large language models (LLMs) and multimodal evaluation techniques. The proposed dual model architecture employs LLaMA 3.2 1B-Instruct to generate software feature descriptions from an end user perspective, followed by DeepSeek-R1-Distill-Qwen-32B to prduce corresponding UML use case diagrams along with reasoning traces. The resulting dataset comprises 3,000 samples, each containing a feature description paired with a UML diagram. To ensure quality and consistency, a multi-model visual verification system is introduced, incorporating three vision-language models to evaluate the alignment between textual inputs and generated diagrams. Each model assigns a score ranging from 1 to 6, and final scores …