Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations
Title: Unmasking the Synthetic Employee: A Comparative Analysis of Racial and Gender Bias in LLM-Generated Profiles Across 41 Job Titles
Abstract:
As generative artificial intelligence becomes a primary tool for depicting individuals in professional contexts, it is imperative to scrutinize the racial and gender biases embedded within these portrayals. This study conducts a comprehensive audit of more than 1.5 million occupational personas created by four prominent large language models—GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium—spanning 41 distinct U.S. job categories. By benchmarking these AI-generated profiles against data from the U.S. Bureau of Labor Statistics (BLS), we identify a significant discrepancy: the models produce demographic outputs with reduced variance compared to real-world statistics. Essentially, the AI compresses each occupation toward a singular, dominant demographic archetype, failing to capture the broader population-level diversity.
Our decomposition analysis of these shifts and exaggerations exposes the specific nature of these distortions. White workers are underrepresented by 31 percentage points, and Black workers by 9 percentage points. Conversely, Hispanic workers are overrepresented by 17 percentage points, and Asian workers by 12 percentage points. This pattern suggests that stereotype exaggeration intensifies existing occupational segregation. The resulting representations are often stark; for instance, housekeepers are depicted as almost exclusively Hispanic, while Black workers are nearly erased from numerous job categories. Given that these biases manifest across models with diverse institutional and cultural backgrounds, they point to shared structural sources of prejudice rather than isolated model-specific errors. We contend that auditing generative AI demands evaluation frameworks capable of assessing how synthetic populations systematically alter demographic visibility within social roles.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



