Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching
Title: Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching
Abstract:
While contemporary generative models demonstrate a profound comprehension of visual content, their adaptation for image editing traditionally hinges on extensive datasets of paired examples. This dependency severely hampers scalability, particularly in the realm of video editing, where the acquisition of paired data is prohibitively costly. To address this, we introduce Bootstrap Your Generator (ByG), a versatile framework designed for the unpaired training of flow matching editing models. ByG capitalizes on the inherent knowledge of the base model, operating independently of external signals. The methodology combines instruction-following cues, derived from the frozen model, with cycle-consistency mechanisms to ensure structural integrity. To render this process computationally feasible, we propose a novel technique for routing gradients from downstream losses—originating from clean predictions—back to the noisy training states. Our approach achieves state-of-the-art performance in challenging scenarios characterized by scarce data for both image and video editing. Comprehensive evaluations and user studies confirm that ByG generalizes effectively to novel domains, surpassing supervised baselines that rely on millions of training samples. Furthermore, our analysis indicates that gradient routing effectively narrows the gap between training and inference, while the extraction of semantic cues from the base model yields a robust training signal, eliminating the necessity for external reward models.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





