arXiv

X-Restormer++: 1st Place Solution for the UG2+ CVPR 2026 All-Weather Restoration Challenge

June 3, 2026 · Youwei Pan, Leilei Cao, Yingfang Zhu, Fengjie Zhu · Original Source

Title: X-Restormer++: Secures First Place in the UG2+ CVPR 2026 All-Weather Restoration Challenge

Abstract:

This paper details our victorious approach to Track 1 of the 8th UG2+ Challenge (CVPR 2026), which focuses on Image Restoration under All-weather Conditions. The proposed architecture extends the X-Restormer framework, leveraging its dual-attention mechanism—comprising Multi-DConv Head Transposed Attention and Overlapping Cross-Attention—to effectively capture both channel-wise global dependencies and spatially-local structural features. Furthermore, we integrate the spatially-adaptive input scaling mechanism characteristic of Restormer-Plus.

Our methodology employs a two-stage training protocol combined with dual-model ensemble inference. During the initial stage, Model B is trained from the ground up using a substantial subset of the FoundIR training set. This involved randomly sampling approximately 800 GB from the 4.84 TB available data, encompassing five distinct degradation categories: blur, haze, rain, snow, and composite scenarios such as simultaneous rain and haze. In the subsequent stage, Model A undergoes fine-tuning on the WeatherStream dataset (specifically the rain and snow splits). We utilize the final checkpoint of Model B as a pretrained initialization for Model A, facilitating efficient domain adaptation with a significantly reduced dataset size.

To enhance the preservation of structural details throughout the training process, we introduce a novel Gradient-Guided Edge-Aware (GGEA) Loss. This loss function utilizes Sobel operators on the ground-truth images to generate a spatially adaptive weight map, thereby prioritizing supervision for edge and high-frequency regions. The GGEA Loss is integrated with L1 and Multi-Scale SSIM losses to form a comprehensive training objective.

At the inference phase, we combine predictions from both models using a weighted average defined as $out = 0.4 \times out_A + 0.6 \times out_B$. The greater weight assigned to Model B acknowledges its superior generalization capabilities derived from extensive large-scale pretraining. Through these strategic implementations, our method achieved the top ranking in the challenge.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC