EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion
Title: EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion
Large-scale text-to-image diffusion models often fail to provide dependable indicators regarding the likelihood of misaligned outputs, particularly when the underlying training data remains inaccessible. This study investigates whether leveraging expert disagreement within pre-trained mixture-of-experts (MoE) diffusion architectures can offer a robust estimation of epistemic uncertainty.
We propose EMoE, a novel approach that requires no additional training. The method functions by isolating expert-specific computational pathways at an initial MoE layer while maintaining identical starting noise across these paths. By calculating the variance among the latent representations generated after the first denoising step, EMoE delivers an uncertainty-aware assessment of the prompt prior to the completion of image synthesis. This process eliminates the need for auxiliary neural networks or the training of diffusion ensembles.
Evaluations on the COCO and CC3M datasets demonstrate that EMoE consistently outperforms both diffusion-specific and router-based baseline methods in ranking prompts according to text-image alignment quality metrics. Furthermore, when applied to multilingual inputs, EMoE reveals systematic variations in both generation quality and expert disagreement tied to specific languages, highlighting the impact of shared vocabulary. These findings establish EMoE as a viable diagnostic instrument for assessing prompt risk, model coverage, and bias within MoE-based text-to-image diffusion systems.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




