PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification
Title: PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification
Abstract: Transferring knowledge from large Vision-Language Models (VLMs) to efficient networks is a critical yet difficult task in Fine-Grained Visual Classification (FGVC), primarily because of the limitations imposed by fixed prompts and global alignment. To overcome these hurdles, we introduce PAND (Prompt-Aware Neighborhood Distillation), a novel two-stage framework that separates semantic calibration from structural transfer. The process begins with Prompt-Aware Semantic Calibration, which creates adaptive semantic anchors. Subsequently, we employ a neighborhood-aware structural distillation approach to regulate the local decision-making structure of the student model. PAND consistently beats state-of-the-art techniques across four FGVC benchmarks. Specifically, our ResNet-18 student model reaches an accuracy of 76.09% on CUB-200, outperforming the robust VL2Lite baseline by 3.4%. The code can be accessed at https://github.com/LLLVTA/PAND.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



