Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation
Title: Unpacking the "Laziness" of DMD Students: The Mechanics of Copying in Few-Step Distillation
Abstract: Distribution Matching Distillation (DMD) serves to compress pre-trained diffusion models into streamlined, few-step generators by synchronizing their noised distributions across various scales. Theoretically, this type of supervision at the distribution level should remain indifferent to the specific pairings of noise and data within the teacher model, thereby granting the student the liberty to remap latent noise—a pattern frequently seen in low-dimensional contexts. However, we observe a surprising shift in high-dimensional scenarios: distilled students spontaneously replicate the teacher’s original noise-data pairings, a behavior we define as "copying." Our analysis confirms that this phenomenon is not a side effect of adversarial objectives nor a consequence of the teacher’s memorization capabilities. Instead, the evidence points to copying as an emergent characteristic resulting from the constrained geometric freedom inherent to student models during high-dimensional distillation.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





