arXiv

HiTokSR: A Coarse-to-Fine Tokenizer with Hierarchical Codebooks for High-Fidelity Real-World Image Super-Resolution

Title: HiTokSR: A Coarse-to-Fine Tokenizer with Hierarchical Codebooks for High-Fidelity Real-World Image Super-Resolution

Vector-quantized (VQ) generative models have demonstrated significant potential in the realm of real-world image super-resolution (Real-ISR). Nevertheless, current approaches predominantly depend on a unified latent space, a design choice that conflates low-frequency structural elements with high-frequency textural details. This entanglement necessitates that a solitary codebook manage a combinatorially intricate array of structure-texture combinations, thereby restricting representational capacity and hindering efficient codebook utilization.

To overcome these limitations, we introduce HiTokSR, a novel hierarchical token prediction framework. Rather than employing a single codebook, HiTokSR divides the latent space along the channel axis into distinct, frequency-aware groups, assigning an independent sub-codebook to each for quantization. This coarse-to-fine architecture effectively separates global structures from intricate details, boosting combinatorial expressiveness while avoiding the optimization instability often associated with high-dimensional nearest-neighbor lookups.

To further bolster semantic consistency, the generator incorporates priors from a vision foundation model through adaptive feature modulation, multi-scale class tokens, and a representation alignment loss. Moreover, we propose an index-level perturbation strategy during the fine-tuning of the decoder to mitigate the discrepancy between training and testing phases in discrete token prediction. Comprehensive experiments conducted on real-world benchmarks reveal that HiTokSR delivers state-of-the-art results, excelling in both reconstruction fidelity and perceptual quality.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...