Shifting the Breaking Point of Flow Matching for Multi-Instance Editing
Title: Redefining the Limits of Flow Matching for Multi-Instance Image Editing
Abstract:
Flow matching models have recently gained traction as a high-efficiency alternative to diffusion techniques, particularly in the realms of text-guided image generation and editing. By leveraging continuous-time dynamics, these models facilitate significantly faster inference speeds. Nevertheless, current flow-based editing tools are largely restricted to global modifications or single-instruction updates. They face considerable challenges in multi-instance contexts, where distinct segments of a reference image require independent alteration without causing semantic overlap or interference.
This paper identifies the root cause of this limitation as the entanglement of concurrent edits, driven by globally conditioned velocity fields and joint attention mechanisms. To resolve this, we propose Instance-Disentangled Attention, a novel mechanism designed to partition joint attention operations. This approach ensures that instance-specific textual commands are strictly bound to their corresponding spatial regions during the estimation of velocity fields.
We assess the efficacy of our method through evaluations on natural image editing tasks and a novel benchmark comprising text-heavy infographics, which features region-level editing instructions. Our experimental findings indicate that the proposed method successfully achieves edit disentanglement and spatial locality, all while maintaining global coherence in the output. Consequently, this enables efficient, single-pass editing at the instance level.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






