GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection
Title: GiPL: Generative Augmented Iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection
Abstract:
Vision-language foundation models have demonstrated strong potential for zero-shot generalization in Cross-Domain Few-Shot Object Detection (CD-FSOD). Nevertheless, their fine-tuning process is hindered by two major obstacles: the underutilization of support sets caused by sparse single-instance annotations, and significant overfitting resulting from the scarcity of target-domain samples. To overcome these limitations, we introduce GiPL, a streamlined two-branch training framework. The first branch employs an iterative pseudo-label self-training approach. This method conducts zero-shot inference on the support set to produce trustworthy pseudo-annotations, which are then combined with ground-truth labels. This fusion allows for the iterative optimization of the model, thereby maximizing the utility of support set data. The second branch incorporates a generative data augmentation pipeline driven by large vision-language models. This component synthesizes images containing multiple annotated objects that are aligned with the target domain, effectively expanding the training dataset and mitigating overfitting. Comprehensive experiments conducted on three rigorous CD-FSOD datasets—RUOD, CARPK, and CarDD—across 1/5/10-shot configurations reveal that GiPL consistently surpasses current state-of-the-art techniques, achieving substantial improvements in performance. The code is accessible at \href{https://github.com/z-yaz/CDiscover}{CDiscover}.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





