Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models
Title: Optimizing Wan2.2 Dual-Expert Video Diffusion Models Through Joint Few-Step Distillation and Low-Bit Quantization
Abstract:
While large-scale video diffusion models deliver exceptional visual fidelity, their deployment is often hindered by high computational costs, stemming from the necessity of extensive denoising iterations and substantial memory requirements due to large parameter counts. To address these challenges, this study introduces a deployment-focused compression framework for the Wan2.2-T2V-A14B architecture. This approach integrates few-step distribution-matching distillation with low-bit quantization techniques.
The proposed pipeline aligns with the model’s dual-expert denoising structure, applying separate calibration to both high-noise and low-noise branches. It safeguards sensitive input layers while employing HiF4-style low-bit representations to enhance dynamic range coverage. Notably, the quantization process is calibrated on the distilled few-step student model rather than the original, longer-step trajectory, thereby minimizing activation distribution mismatches during inference. This co-designed strategy ensures that the quantized model performs comparably to the same-step full-precision version, while consistently outperforming the original full-precision baseline at both 8 and 20 steps on average. Among the evaluated configurations, the 20-step setting offers the optimal balance between quality and efficiency.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




