arXiv

Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?

June 4, 2026 · Leonard Kinzinger, Jochen Hartmann · Original Source

Title: Synthetic Personalities: The Efficacy of LLMs in Replicating Individual Respondents via Socio-Economic Microdata

Abstract

While large language model (LLM) digital twins offer the potential to expand and expedite market research, current implementations largely fall into two categories: coarse persona bots driven by minimal demographic inputs, or highly granular individual twins derived from purpose-built surveys and interview records. Neither approach addresses the most critical operational need for marketing professionals: generating precise individual profiles from the vast, heterogeneous panel data that companies already possess through customer relationship management (CRM) systems, loyalty programs, and recurring surveys.

To bridge this gap, we developed detailed individual-level digital twins using data from the German Socio-Economic Panel (SOEP). We assessed their performance within a comprehensive $3 \times 5 \times 2 \times 2$ experimental matrix, varying three open-weight LLMs, five levels of cumulative information depth (measured by normalized Shannon entropy), two embedding techniques, and two reasoning configurations. This evaluation encompassed over 2.1 million generated responses covering 500 participants and 183 questions held out for testing.

Our results indicate that twin quality improves as information depth increases, though returns diminish significantly after the 75th percentile of entropy. This threshold represents a cost-efficient Pareto optimum compared to the highest-performing 100% depth scenarios. Furthermore, replacing narrative persona summaries with raw dialog histories of past responses boosted hold-out accuracy across every model-reasoning combination at full depth. Conversely, enabling an explicit thinking mode enhanced rank-order correlation but did not significantly impact accuracy. The top-performing configuration achieved a hold-out accuracy of 78.8% and a Fisher-$z$ correlation of $r = 0.590$ on the SOEP test set. These outcomes imply that the primary constraints on twin-based market research are no longer data design limitations, but rather item volume, model choice, and specific construction-level decisions, which this study systematically delineates.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC