Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Title: Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Abstract: While CLIP demonstrates robust performance in zero-shot classification, it remains significantly susceptible to adversarial attacks. Previous approaches to adversarial fine-tuning have largely focused on aligning predicted logits between clean and adversarial samples, a strategy that neglects uncertainty calibration and risks compromising zero-shot generalization capabilities. In the context of reliable uncertainty estimation, it is generally expected that predictive uncertainty should rise as inputs grow more challenging or deviate from the training distribution. However, we frequently observe the reverse in adversarial scenarios: perturbations not only reduce accuracy but also suppress uncertainty, resulting in significant miscalibration and over-confidence. This phenomenon highlights a critical reliability gap that extends beyond mere robustness. To address this issue, we introduce an adversarial fine-tuning objective for CLIP that balances both accuracy and uncertainty. By reparameterizing CLIP outputs as the concentration parameters of a Dirichlet distribution, we develop a unified representation that encapsulates both relative semantic structure and confidence magnitude. This approach facilitates holistic distribution alignment under perturbations, moving past single-logit anchoring to restore calibrated uncertainty. Experiments conducted across various zero-shot benchmarks indicate that our method substantially enhances uncertainty calibration while maintaining competitive adversarial robustness and preserving clean accuracy.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




