From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents
Title: From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents
Abstract
The role of AI in programming has evolved beyond simple autocomplete suggestions or conversational assistants; these tools are now emerging as structured development frameworks that define specific processes, roles, artifacts, and verification methods. While existing literature surveys agents and large language models (LLMs) within software engineering, there remains a gap in research focusing on the operational frameworks that transform these technical capabilities into coherent workflows. To address this, we conducted a directed search of primary sources, applying functional inclusion criteria and traction metrics to select six distinct frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty, and Reversa.
Each of these frameworks approaches AI-driven development through unique methodologies, ranging from full and lightweight spec-driven development to agent-driven agile planning. Others focus on context engineering, worktree isolation and review processes, or the extraction of operational specifications from legacy systems. Our primary contribution is a six-dimensional process taxonomy encompassing specification, context, roles, execution, validation, and portability. We developed a scoring rubric based on this taxonomy to create a replicable assessment instrument.
We applied this rubric to the six selected frameworks, as well as an out-of-sample case study involving Spec-Flow. Two key findings emerged from this analysis. First, among frameworks that have already integrated some level of process, there is a noticeable convergence: the isolated prompt is losing its central role. Instead, persistent artifacts, work contracts, traceability, and human review are becoming essential mechanisms for reducing ambiguity and coordinating agents. Second, no single framework comprehensively covers all six dimensions, revealing a structural trade-off between the depth of process implementation and portability across different agents.
Additionally, we identified several recurring risks associated with these frameworks, including drift between specifications and code, over-reliance on generated artifacts, the fragility of community extensions, platform dependency, and a general lack of benchmarks for evaluating the entire process. The paper concludes with a research agenda for empirical evaluation, emphasizing the need for intermediate-quality metrics, context governance, installation security, and improved reproducibility.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





