VICR: Visual In-Context Restoration for Real-World Image Super-Resolution
Title: VICR: Visual In-Context Restoration for Real-World Image Super-Resolution
Original: arXiv:2606.00704v1 Announce Type: new
Abstract: Real-world image super-resolution (Real-ISR) demands a careful equilibrium between maintaining structural accuracy relative to degraded inputs and generating plausible, realistic details. Current generative approaches to Real-ISR frequently suffer from entangled conditioning mechanisms, which can result in structural drift or the creation of details that lack semantic coherence. To overcome these limitations, we introduce Visual In-Context Restoration (VICR), a framework built on Diffusion Transformers (DiT) that treats Real-ISR as an image completion task. Our method features a decoupled mechanism for injecting visual priors, extracting both local and global insights from the low-quality (LQ) source image. Specifically, local cues are utilized to reconstruct image structures and facilitate the synthesis of high-frequency details, whereas global cues steer the overall generation process to ensure semantic consistency. In cases involving ambiguous areas affected by heavy degradation, VICR utilizes an inference-time agent to enhance semantic prompts. This refinement relies on visual evidence from the LQ input, all while keeping the model parameters static. Experimental results demonstrate that VICR sets a new state-of-the-art performance across various Real-ISR benchmarks, achieving this with merely 127M trainable parameters.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





