Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models
Title: Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models
Abstract:
The application of prompt-based and adapter-based tuning methods to vision-language models (VLMs) has gained traction in medical imaging, primarily because the sensitivity of clinical data necessitates frozen backbones, while the scarcity of annotations limits full fine-tuning. Nevertheless, existing approaches generally focus solely on optimizing for the ground-truth class, categorizing all other classes as equally erroneous. This oversight neglects clinically significant relationships between classes, often resulting in unstable decision boundaries when supervision is scarce. To address this, we introduce Omni-Geometry Knowledge Distillation (OGKD), a novel framework designed to embed class-relation structures into the teacher model. This process generates directional targets that uphold the ground truth while honoring inter-class geometric relationships. Leveraging these targets, we formulate two distinct distillation losses: Global Geometry-Aware Distillation (GAD), which operates on the global image token, and Label-Guided Geometry Distillation (LGD), which extends this geometric perspective to attentive patch tokens to enhance fine-grained alignment. In extensive experiments and analyses across 11 prominent medical datasets, evaluating both base-to-novel and few-shot scenarios, our OGKD framework delivers significantly superior performance. It consistently boosts accuracy by an average absolute margin of 1.7% to 2.8% compared to the leading VLM adaptation methods. Furthermore, the approach demonstrates robust generalization capabilities to unseen classes and provides more dependable predictions than alternative techniques. The source code for this work is accessible at https://github.com/tientrandinh/OGKD.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





