arXiv

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

June 2, 2026 · Ruiyin Li, Yiran Zhang, Xiyu Zhou, Yangxiao Cai, Peng Liang, Weisong Sun, Jifeng Xuan, Zhi Jin, Yang Liu · Original Source

Title: Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

Abstract

Designing software architecture is a pivotal stage in development, characterized by its complexity and heavy reliance on domain knowledge. It necessitates striking a balance among conflicting quality attributes while remaining responsive to shifting requirements. Historically, this endeavor has been laborious and dependent on human experts, frequently leading to constrained exploration of diverse architectural styles and decompositions, particularly within the fast-paced environment of agile methodologies. Although Large Language Model (LLM)-based agents have demonstrated significant potential in various software engineering domains, their utilization for architecture design is still underexplored and lacks systematic investigation.

To overcome these limitations, we introduce MAAD (Multi-Agent Architecture Design), a knowledge-centric framework designed to coordinate four distinct specialized agents: the Analyst, Modeler, Designer, and Evaluator. This system autonomously and collaboratively converts requirement specifications into comprehensive, multi-view architectural blueprints, complete with quality attribute assessments. MAAD integrates Retrieval-Augmented Generation (RAG) to embed established architectural standards and patterns into the workflow, while also employing a hierarchical memory mechanism that preserves design history to facilitate iterative improvement.

We assessed MAAD’s performance through comparative experiments against MetaGPT. The evaluation comprised quantitative architecture-level metrics across ten case studies and qualitative insights from industry architects regarding ten real-world specifications. The findings indicate that MAAD produces architectures that are more modular, complete, and traceable than the baseline model. Additionally, its specialized Evaluator agent autonomously generates structured quality assessment reports, thereby substantially lowering the burden of manual validation. Our analysis further reveals that the quality of the resulting architecture is strongly influenced by the reasoning capabilities of the underlying LLM. Specifically, GPT-5.2 and Qwen3.5 demonstrated superior performance compared to other models across the majority of evaluation scenarios.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC