RAIGen: Rare Attribute Identification in Text-to-Image Generative Models
Title: RAIGen: Uncovering Rare Attributes in Text-to-Image Generative Models
Abstract:
While text-to-image diffusion models have demonstrated remarkable generation capabilities, they often perpetuate and exacerbate biases present in their training data, resulting in a skewed representation of semantic attributes. Existing research typically tackles this issue through one of two lenses: closed-set methods, which address biases within predefined fairness categories (such as race or gender) based on known socially significant minority attributes; or open-set methods, which treat the problem as bias identification by focusing on the majority attributes that overwhelmingly dominate model outputs. However, both approaches neglect a crucial complementary objective: the discovery of rare or minority features—whether social, cultural, or stylistic—that are underrepresented in the data distribution but still captured within model representations.
To address this gap, we present RAIGen, the first known framework for label-free rare-attribute discovery in diffusion models, which operates without the need for predefined minority categories. RAIGen utilizes Matryoshka Sparse Autoencoders alongside a novel metric for minority identification that integrates neuron activation frequency with semantic distinctiveness. This combination allows for the pinpointing of interpretable neurons, where the images triggering the highest activations expose underrepresented attributes. Our experimental results indicate that RAIGen successfully identifies attributes that fall outside standard fairness categories within Stable Diffusion. Furthermore, the framework is scalable to larger architectures like SDXL, facilitates systematic auditing across different models, and permits the targeted amplification of rare attributes during the generation process. More details can be found at https://vssilpa.github.io/RAIGen_webpage/.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





