arXiv

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

June 2, 2026 · Peihan Liu, Lucas Rosenblatt, Weiwei Kong, Natalia Ponomareva, Gautam Kamath, Rachel Cummings, Roxana Geambasu, Yu Gan, Lillian Tsai, Alex Bie · Original Source

Title: ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Abstract:

While differentially private (DP) text synthesis offers a pathway to utilizing sensitive datasets for model training, it remains uncertain whether such synthetic data actually conveys novel knowledge and capabilities inherent exclusively to those sources. Current evaluation methods are insufficient for answering this question because they depend on tasks that can often be solved without any training; consequently, high performance on these benchmarks does not confirm that DP synthesis can effectively replace direct access to original data. To address this gap, we present ContinuousBench, a dynamic benchmark that is automatically and continuously regenerated to assess the capability improvements derived from DP synthetic text. Every quarter, the benchmark releases a fresh iteration consisting of a previously unseen training corpus alongside a corresponding question-answering set. This QA set is specifically designed to meet two criteria: it must be unsolvable without the corpus and learnable under DP constraints, ensured by the fact that the target knowledge is backed by hundreds of independent records. Researchers can generate DP synthetic data from the provided training corpus and utilize our standardized training and evaluation framework to quantify the resulting gains. We demonstrate two specific tracks: Geminon, which features a procedurally generated dataset concerning fictional creatures, and News, which utilizes a stream of newly crawled public news articles. Our findings reveal that while standard benchmarks are largely saturated, non-private synthesis successfully transfers significant knowledge from the original corpus to models on ContinuousBench. In contrast, state-of-the-art DP synthesis methods generally fail to achieve this transfer, even when operating at an privacy budget of $\varepsilon=100$.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC