arXiv

Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models

June 4, 2026 · Huichan Seo, Sieun Choi, Minki Hong, Yi Zhou, Junseo Kim, Lukman Ismaila, Naome Etori, Mehul Agarwal, Zhixuan Liu, Jihie Kim, Jean Oh · Original Source

Title: Uncovering Hidden Prejudices: Assessing Cultural Bias in Generative Image Models

Generative image models are capable of producing visually impressive outputs, yet they frequently fail to represent cultural nuances accurately. While existing research has predominantly focused on cultural bias within text-to-image (T2I) systems, the realm of image-to-image (I2I) editing tools remains largely unexamined. This study addresses that oversight by implementing a comprehensive evaluation framework spanning six nations, utilizing an 8-category and 36-subcategory schema, and employing era-specific prompts. This approach allows for a standardized audit of both T2I generation and I2I editing, facilitating direct comparison.

By leveraging open-source models with consistent settings, we conducted evaluations across different countries, time periods, and categories. Our methodology integrates standard automated metrics, a culture-conscious retrieval-augmented visual question answering (VQA) system, and assessments from native expert reviewers. To ensure full reproducibility, we have made available the entire image dataset, prompt libraries, and configuration details.

The analysis yields three critical insights: 1. When presented with country-agnostic prompts, models tend to default to modern, Global-North-centric representations, thereby obscuring distinctions between different cultures. 2. Iterative I2I editing progressively diminishes cultural accuracy, even when traditional performance metrics appear stable or show improvement. 3. I2I models rely on superficial adjustments, such as altering color palettes or adding generic props, rather than making contextually appropriate, era-consistent changes. This is particularly evident in their tendency to preserve the source identity when targeting Global-South subjects.

These findings underscore the unreliability of current systems when handling culture-sensitive edits. By providing standardized data, prompts, and human evaluation protocols, this work establishes a reproducible, culture-focused benchmark designed to diagnose and monitor cultural bias in generative image technologies.

Project page: https://seochan99.github.io/ECB

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC