arXiv

Characterizing the Effect of Noise in Language Generation in the Limit

June 2, 2026 · Aaron Li, Ian Zhang · Original Source

Title: Characterizing the Effect of Noise in Language Generation in the Limit

Abstract:

Kleinberg and Mullainathan initially introduced a formal framework known as "language generation in the limit" to investigate language generation phenomena. Within this paradigm, an algorithm must produce unseen strings from an unknown target language in finite time, based on an enumeration of example strings provided by an adversary. Subsequent work by Li, Raman, and Tewari (2025) refined the concepts of uniform and non-uniform generation, while Raman and Raman (2025) proposed a noisy variant where the adversary may introduce extraneous strings. A key question arising from this noisy model is how to measure the impact of noise by analyzing the effect of each additional extraneous string.

This paper presents two complementary findings regarding this setting. First, we demonstrate that for both uniform and non-uniform generation, the inclusion of a single noisy string strictly diminishes the set of collections that can be generated. This finding resolves an open question posed by Raman and Raman (2025). Second, we establish that for both generation types, the ability to generate strings with one noisy input is equivalent to the ability to do so with any finite quantity of noise. This result stands in sharp contrast to the strict hierarchy of noisy generation limits previously demonstrated by Bai, Panigrahi, and Zhang (2026). Finally, building on these insights, we offer the first known characterization of non-uniform noise-dependent generatability.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC