arXiv

HyperVQ: Enabling Hyperprior Entropy Modeling for VQ-Based Generative Image Compression

June 2, 2026 · Niu Yi, Xu Tianyi, Ma Mingming, Wang Xinkun · Original Source

Title: HyperVQ: Facilitating Hyperprior Entropy Modeling in VQ-Based Generative Image Compression

Abstract:

While Vector Quantization (VQ) driven generative image compression has delivered exceptional perceptual fidelity, current VQ codecs are hindered by two core constraints. Firstly, these systems typically depend on static frequency distributions rather than efficient, content-adaptive entropy modeling, resulting in suboptimal coding efficiency. Secondly, the fundamental tension between discrete indices and continuous priors obstructs true end-to-end joint Rate-Distortion (RD) optimization.

To address these challenges, we introduce HyperVQ, a rigorous framework designed to build a high-performance hyperprior entropy foundation for VQ-based codecs. The central premise of HyperVQ is to relocate probability modeling exclusively into the continuous embedding space. Rather than forecasting probabilities for discrete symbols directly, HyperVQ generates a high-dimensional continuous multivariate Gaussian distribution for the continuous latents. By regarding discrete codebook entries as fixed "anchors" within this space, the method transforms the continuous Gaussian density into categorical index probabilities through relative distance calculations. This sophisticated approach yields a potent, spatially-adaptive entropy engine and ensures that the cross-entropy rate objective remains fully differentiable. Consequently, the network can actively and dynamically optimize the RD trade-off throughout the training process.

To guarantee practical applicability, we engineered the lightweight H Block and the Probability Estimation Engine (PEE) to support highly parallel, millisecond-level inference. Our experiments reveal that HyperVQ functions as a universal module compatible with various VQ architectures, including single-scale, large-codebook, and RVQ models. It achieves an average bitrate reduction of 18.5%, which is 7.28 times greater than the savings provided by traditional Huffman coding. This work establishes a robust, RD-controllable base for the next generation of generative image compression technologies.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC