arXiv

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

June 4, 2026 · Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab · Original Source

Title: Avoiding Collapse in Curated Synthetic Data: A Theoretical Analysis of Generative Retraining with Diverse Preferences

Abstract: The recursive retraining of generative models presents a significant representation challenge. When synthetic outputs are filtered using a static reward signal, models frequently converge on a limited range of outputs that excessively optimize for that specific objective. While previous research has indicated that such collapse is inevitable unless real data is incorporated, this study reexamines that assumption through the lens of alignment. We demonstrate that collapse can be effectively mitigated by curating data based on multiple reward functions. By formalizing the dynamics of recursive training under heterogeneous preferences, we prove that, provided certain conditions are met, the model converges to a stable distribution. This distribution distributes probability mass across various high-reward regions, thereby preserving diversity. Furthermore, the limiting distribution satisfies a weighted Nash bargaining solution, providing a rigorous formal interpretation of how values are aggregated within synthetic retraining loops.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

June 4, 2026

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

June 4, 2026

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

June 4, 2026

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

June 4, 2026

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

June 4, 2026

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

June 4, 2026

Hiranandani Group CEO discusses driving India's digital transformation.

Global News Digest

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

Broadcom AI Chip Outlook Disappoints Investors

Europe's tech 'liberation day'? Computer says not yet

Hiranandani Group CEO on Powering India's Digital Future