arXiv

Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation

Title: Optimizing Redundancy in Visual Autoregressive Models: A Latent Discrepancy Approach for Efficient Generation

Visual Autoregressive (VAR) models are renowned for producing high-quality images, yet they often face considerable inference latency, particularly when generating at high resolutions. While recent acceleration strategies have attempted to address this by using heuristic measures based on layer features to prune tokens, these methods frequently struggle with complex contextual semantics. Consequently, they often fail to accurately identify redundant computations and lack adaptability across different prompts.

To address these limitations, this study reevaluates the concept of redundancy in VAR models by examining its direct impact on pixel-space generation. We introduce "Latent Discrepancy," a unified metric designed to quantify a token’s contribution by measuring fluctuations in model states throughout the generation process. Our analysis indicates that redundancy can be pinpointed with greater precision when guided by signals from image latents or pixel-space data. Furthermore, we observed that during classifier-free guidance (CFG), the convergence pattern of the discrepancy between conditional and unconditional branches displays significant dynamics that vary depending on the prompt.

Leveraging these insights, we propose LD-Pruning (Latent Discrepancy Pruning), a training-free framework that eliminates redundancy through latent discrepancy. This approach combines decoding-free region selection with adaptive skipping of the unconditional branch. Extensive experimental results demonstrate that LD-Pruning significantly lowers inference latency without compromising generation quality, achieving a speedup of up to 2.35x on the Infinity-8B model.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...