Length Generalization Bounds for Transformers
Title: Establishing Length Generalization Limits for Transformer Architectures
Abstract:
A fundamental characteristic of robust learning algorithms is length generalization—the capacity to generate accurate predictions for inputs of arbitrary lengths, even when trained on limited data. Securing this capability requires the derivation of specific length generalization bounds, which define the threshold beyond which such generalization is mathematically assured. This study addresses the unresolved question regarding whether these bounds are computable for C-RASP, a language class intrinsically connected to Transformer models. While previous work by Chen et al. offered a partial positive resolution for single-layer C-RASP and, under certain constraints, for two-layer configurations, we deliver definitive answers to this open problem. Our primary finding demonstrates that computable length generalization bounds do not exist for C-RASP, a limitation that extends to Transformers even in the simplest two-layer scenario. Conversely, we identify a computable bound for the positive fragment of C-RASP, establishing its equivalence to fixed-precision Transformers. For both this positive C-RASP subset and fixed-precision Transformers, we determine that length complexity grows exponentially and rigorously prove that these bounds are optimal.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




