Generative Reasoning Meets Multimodal Evaluation: A Novel Approach to Activity and Package Diagram Comparison in UML Modeling
Tóm tắt
Large Language Models (LLMs) have been utilized for many downstream tasks such as writing, reasoning and software engineering. However, only a fraction of studies focused on generating code, especially in the field of software requirements. Thus, this paper presents a unique framework that leverages generative reasoning and multimodal evaluation to compare two essential Unified Modeling Language (UML) diagram types: Activity Diagrams, representing behavioral aspects and Package Diagrams, representing structural organization. In the proposed approach, LLaMA 3.2-1B-Instruct was utilized to generate user-oriented prompts, while DeepSeek-R1-Distill-Qwen-32B performs reasoning and automatically produces PlantUML code for diagram construction. To ensure the reliability and accuracy of the generated diagrams, multimodal large models Qwen2.5-VL-3B-Instruct, LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B are employed within the MMMU Benchmark for evaluation and scoring. Experimental results demonstrated that the combination of reasoning based generation and multimodal assessment provided significant improvements in diagram consistency, semantic correctness, and cross-model verification. The study contributes a novel methodology for automated UML modeling which highlighted the advantages and limitations of Activity versus Package diagrams when generated and validated through generative AI. This study not only contributes to the automation process in software design but also proposes a new framework for evaluating the ability of reasoning on diagrams in model-driven engineering.