FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection
Title: FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection
Abstract:
Although real-world synthetic image detectors typically demonstrate robust performance within their training domains, they frequently struggle to generalize when facing domain shifts. Our analysis, utilizing unsupervised UMAP projections, reveals that while natural and synthetic features retain a degree of separability on unseen datasets, detection accuracy still declines. This discrepancy suggests that the classification head tends to overfit to artifacts specific to the training domain. Consequently, the primary challenge lies in acquiring more transferable representations to ensure that decision criteria remain stable and resilient against domain variations.
Leveraging the structural reality that synthetic images originate from a variety of generators, we introduce a hierarchical contrastive learning framework. This approach enhances the distinction between natural and synthetic images while simultaneously retaining information regarding generator identity. The model jointly optimizes two objectives: a coarse contrastive loss distinguishing natural images from synthetic ones, and a fine-grained contrastive loss among synthetic images based on their generator identities.
Evaluated on the WildFake dataset, our method yields an average AUROC improvement of +10.22 in cross-domain tests compared to the strong baseline DIRE. These tests encompassed Chameleon, AIGIBench, Community Forensics, and GenImage under identical settings. In few-shot adaptation scenarios, where the backbone is frozen and an SVM head is trained on just 10 labeled samples per class, the approach boosted AUROC scores by +10.64 on AIGIBench and +17.41 on Chameleon, averaging results across 12 widely used detectors. The source code for this work is available at: https://github.com/heyongxin233/FiSeR.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





