Scopus

Pioneering a DeepSeek R1-Generated UML Dataset and an Automated Multimodal Visual Validation Framework

Năm XB 2026 Tạp chí / Hội thảo International Conference on Advances in Information and Communication Technology (2025) Vol 1, 346-355 Đơn vị CNTT DOI / Link https://doi.org/10.1007/978-3-032-18159-6_37 ↗

Tác giả

Van-Viet Nguyen ; Huu-Khanh Nguyen ; Kim-Son Nguyen ; Thi Minh-Hue Luong ; The-Vinh Nguyen ^✉ ; Huu-Cong Nguyen ; Duc-Quang Vu

Tóm tắt

This paper presents a novel framework for the automated constrution and validation of a dataset tailored to UML (Unified Modeling Language) code generation, leveraging recent advancements in large language models (LLMs) and multimodal evaluation techniques. The proposed dual model architecture employs LLaMA 3.2 1B-Instruct to generate software feature descriptions from an end user perspective, followed by DeepSeek-R1-Distill-Qwen-32B to prduce corresponding UML use case diagrams along with reasoning traces. The resulting dataset comprises 3,000 samples, each containing a feature description paired with a UML diagram. To ensure quality and consistency, a multi-model visual verification system is introduced, incorporating three vision-language models to evaluate the alignment between textual inputs and generated diagrams. Each model assigns a score ranging from 1 to 6, and final scores …

Tài liệu tham khảo

[1] Booch, G.: The unified modeling language user guide. Pearson Education India (2005)

[2] Ambler, S.W.: The Object Primer: Agile Model-Driven Development with UML 2.0. Cambridge University Press (2004)

[3] Van Nguyen*, V., Nguyen, V.T.: Large language models in software engineering: a systematic review and vision. J. Educ. Sustainable Innov.

ISSN 3025–1052 (2024).: https://ejournal.papanda.org/index.php/jesi/article/view/968

[4] Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

[5] Van Viet*, N., et al.: Revolutionizing education: An extensive analysis of large language models integration, vol. 4, no. No. 4, pp. 10–21 (2024). https://irjstem.com/wp-content/uploads/2025/02/IRJSTEM-V4N4-2024-Paper02.pdf. https://doi.org/10.5281/zenodo.14744029

[6] Khanh*, N.H., Van, V.N., Vinh, N.T., Cong, N.H.: “Phi-3 meets law: fine-tuning mini language models for legal document understanding. Res. Dev. Appl. Inf. Commun. Technol. 2024(3), 136–142 (2024). ISSN: 1859–3526

[7] Mukhtar, M.I., Galadanci, B.S.: Automatic code generation from UML diagrams: the state-of-the-art. Sci. World J. 13(4), 47–60 (2018)

[8] Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

[9] Lin, C.Y., Och, F.J.: Looking for a few good metrics: ROUGE and its evaluation. In: Ntcir workshop (2004)

[10] Wang and etc.: Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution, arXiv preprint arXiv:2409.12191 (2024)

[11] Üstün, A., et al.: Aya model: An instruction finetuned open-access multilingual language model, arXiv preprint arXiv:2402.07827 (2024)

[12] Yue, X., et al.: Mmmu: a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9556–9567 (2024)

[13] Touvron, H., and others, Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023)

[14] DeepSeek-AI, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025). https://arxiv.org/abs/2501.12948

[15] Conrardy, A., Cabot, J.: From image to uml: first results of image based uml diagram generation using llms, arXiv preprint arXiv:2404.11376 (2024)

[16] Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, in NIPS ’22. Red Hook, NY, USA: Curran Associates Inc. (2022)

[17] Chen, M., and others.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

← Quay lại danh sách bài báo