Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification
Title: Contextual Synthetic Data Generation for Fine-Grained Classification
While text-to-image (T2I) models are becoming standard tools for creating synthetic datasets, producing high-quality training data for classification tasks remains a significant hurdle. Although fine-tuning a T2I model using a limited set of real-world examples can enhance output quality, this approach often leads to overfitting and a consequent loss of diversity in the generated samples. To address these challenges in the context of fine-grained classification, we introduce BOB (BeyondOBjects), a novel fine-tuning strategy.
BOB operates by first extracting class-agnostic features—such as object pose and scene background—from a small collection of real examples. During the fine-tuning phase of the T2I model, these attributes are used as explicit conditions. However, during the actual data generation process, these conditions are marginalized out. This architectural choice helps prevent overfitting, maintains the model’s inherent generative prior, lowers estimation errors, and reduces spurious correlations between different classes.
Our extensive evaluation across various T2I architectures, backbone networks, and datasets demonstrates that BOB delivers state-of-the-art results for low-shot fine-grained classification when synthetic data is incorporated. Specifically, on the Aircraft dataset, BOB improved accuracy by 7.4% compared to the DataDream method, raising performance from 50.0% to 57.4% when a CLIP classifier was fine-tuned using five real images alongside 100 synthetic ones. Furthermore, in three out of four tested benchmarks, fine-tuning downstream models with just five real images augmented by BOB yielded superior results compared to using ten real images. Overall, BOB surpassed previous methods in 18 of 24 experimental configurations, delivering accuracy gains of more than 2% in 14 of those cases.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





