Towards Robust Sequential Decomposition for Complex Image Editing
Title: Advancing Robust Sequential Decomposition for Intricate Image Manipulation
Abstract: While recent breakthroughs in visual generative models have facilitated high-quality image editing driven by human commands, these systems frequently falter when faced with complex directives that entail combinatorial operations or dependencies between steps. This challenge is rooted in the constraints of two primary approaches: first, single-turn editing attempts to execute all instructions in a single iteration, which often results in inaccurate parsing of complex prompts and unintended modifications; second, while sequential editing breaks tasks down into manageable steps, it is prone to accumulating errors during execution, thereby compromising output fidelity. To address these issues, we investigate the editing performance of various paradigms within a unified in-context framework, aiming to balance the advantages of sequential decomposition against its tendency to accumulate errors. Additionally, we introduce a synthetic data pipeline capable of generating editing tasks with varying levels of instructional complexity, enabling the creation of a large-scale dataset featuring high-quality, decomposed editing sequences. Our experiments reveal that fine-tuning on this synthetic data allows sequential decomposition to deliver robust enhancements as task complexity rises, provided the editing paradigms are appropriately structured. Moreover, we demonstrate that decomposition skills acquired from synthetic tasks can be effectively transferred to real-world images through co-training with actual editing data, highlighting the potential for sim-to-real generalization in handling complex image editing across diverse domains.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



