ISI/SCIE/SSCI

A Novel AI-Driven Approach to UML Dataset Generation and Multimodal Verification in the Design Phase

Tạp chí / Hội thảo: KSII Transactions on Internet and Information Systems Vol. 20, No. 4, April 30, 2026 Đơn vị: CNTT DOI / Link:

Tác giả

Tác giả liên hệ

Tóm tắt

The Design phase plays a critical role in software engineering by bridging requirements and implementation through Unified Modeling Language (UML) diagrams. Despite its importance, the manual creation of high quality UML datasets is labor intensive, creating a data scarcity that hinders the development of intelligent modeling tools. To address this, we propose a comprehensive AI-driven framework for the automated generation and multimodal verification of UML diagrams. Our approach utilizes a dual-model pipeline: LLaMA 3.2-1B-Instruct generates detailed technical specifications, while DeepSeek-R1-Distill-Qwen-32B leverages reasoning capabilities to synthesize syntactically precise PlantUML code. We generated 8,000 samples covering Class, Object, Component and Package diagrams. Crucially, we introduce a novel Multimodal Verification System that employs an ensemble of three Vision-Language Models (VLMs) Qwen2.5-VL, LLaMA-3.2-Vision, and Aya-Vision to validate diagram fidelity. To mitigate model bias, we apply a weighted scoring strategy proportional to each model’s MMMU benchmark performance. Statistical analysis reveals a strong correlation between our automated scoring and human expert evaluation (Pearson r>0.65), validating the system's reliability. The resulting dataset and framework provide a foundational resource for the AI for Software Engineering community, significantly reducing the manual effort required for dataset construction.

Từ khoá

Automated UML Generation Multimodal Verification Vision-Language Models (VLMs) AI for Software Engineering Large Language Models (LLMs).