arXiv

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Title: ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Abstract:

While differentially private (DP) text synthesis offers a pathway to utilizing sensitive datasets for model training, it remains uncertain whether such synthetic data actually conveys novel knowledge and capabilities inherent exclusively to those sources. Current evaluation methods are insufficient for answering this question because they depend on tasks that can often be solved without any training; consequently, high performance on these benchmarks does not confirm that DP synthesis can effectively replace direct access to original data. To address this gap, we present ContinuousBench, a dynamic benchmark that is automatically and continuously regenerated to assess the capability improvements derived from DP synthetic text. Every quarter, the benchmark releases a fresh iteration consisting of a previously unseen training corpus alongside a corresponding question-answering set. This QA set is specifically designed to meet two criteria: it must be unsolvable without the corpus and learnable under DP constraints, ensured by the fact that the target knowledge is backed by hundreds of independent records. Researchers can generate DP synthetic data from the provided training corpus and utilize our standardized training and evaluation framework to quantify the resulting gains. We demonstrate two specific tracks: Geminon, which features a procedurally generated dataset concerning fictional creatures, and News, which utilizes a stream of newly crawled public news articles. Our findings reveal that while standard benchmarks are largely saturated, non-private synthesis successfully transfers significant knowledge from the original corpus to models on ContinuousBench. In contrast, state-of-the-art DP synthesis methods generally fail to achieve this transfer, even when operating at an privacy budget of $\varepsilon=100$.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...