InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space
Title: InstantRetouch: Achieving High-Fidelity, Instruction-Driven Image Retouching via Efficient Bilateral Space Manipulation
Abstract:
Language-guided photo retouching seeks to modify color and tonal characteristics while maintaining the integrity of the image’s geometry and texture. Although diffusion-based approaches have recently demonstrated superior visual aesthetics, they frequently encounter challenges related to fidelity—stemming from their generative nature—and efficiency, due to the time-consuming iterative sampling process. To address these limitations, we introduce a retouching framework that leverages bilateral space manipulation, offering a compact and content-decoupled solution. Rather than directly altering pixels or image latents, our model forecasts a low-resolution bilateral grid of affine transformations. These transformations are subsequently sliced via a learned guidance map and applied to the full-resolution image, thereby ensuring both high fidelity and enhanced computational efficiency.
To preserve the robust priors inherent in pretrained generative models, we employ Variational Score Distillation to distill a multi-step diffusion model into our bilateral grid architecture. This is further supported by a prompt alignment loss designed to steer instruction-following capabilities. We also present a novel benchmark to assess our method across several key dimensions: fidelity, adherence to instructions, and efficiency. In comparative evaluations against state-of-the-art retouching tools, such as Gemini-2.5-Flash (Nano-Banana), our approach successfully prevents content drift, drastically reduces latency, and produces visually appealing edits while upholding a high standard of fidelity.
Project page: https://openimaginglab.github.io/InstantRetouch/.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






