arXiv

Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

June 2, 2026 · Zhili Li, Kangyang Chai, Zhihao Wang, Xiaowei Jia, Yanhua Li, Gengchen Mai, Sergii Skakun, Dinesh Manocha, Yiqun Xie · Original Source

Title: Prioritizing Task Utility Over Visual Fidelity: A Downstream-Integrated Benchmark for Large-Scale Remote Sensing Super-Resolution

Abstract:

While super-resolution (SR) methods have significantly advanced the reconstruction of high-resolution imagery from low-resolution sources, current evaluation standards often fall short of capturing their real-world value. Although higher resolution offers visual clarity and aids in monitoring, existing SR research and benchmarks predominantly rely on fidelity metrics like PSNR and SSIM. This approach overlooks the primary purpose of super-resolved images: to enhance downstream applications such as change detection, biomass estimation, and land cover classification. To address this disconnect, we present GeoSR-Bench, a novel benchmark dataset designed to evaluate SR models through the lens of downstream task integration rather than mere visual fidelity.

GeoSR-Bench features high-quality, temporally aligned, and spatially co-located image pairs derived from approximately 36,000 diverse locations. The dataset covers a wide range of land cover types and resolutions, extending from 500m down to 0.6m. To our knowledge, this is the first SR benchmark that explicitly links the resolution improvements achieved by SR models to their effectiveness in Earth monitoring tasks, including infrastructure mapping, biophysical variable estimation, and land cover segmentation.

We utilized GeoSR-Bench to assess the perceptual quality and downstream performance of various SR architectures, including GANs, transformers, neural operators, and diffusion-based models. Our experimental framework comprised 270 distinct settings, encompassing two cross-platform SR tasks, nine SR models, three downstream task models, and five specific downstream tasks per SR task. The findings reveal a critical insight: enhancements in traditional SR metrics do not necessarily translate to better task performance; in some cases, the correlation is even negative. This suggests that conventional fidelity metrics offer limited utility for selecting models intended for downstream applications. Consequently, these results underscore the necessity of incorporating downstream task objectives into both the development and evaluation phases of SR models.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC