Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete
Title: Reevaluating Positional Encoding: Sliding-Window Transformers Without PE Are Turing Complete
Abstract: The prevailing consensus holds that positional encoding (PE) is essential for transformers to handle ordered sequences, as next-token prediction without PE seems to treat context tokens as permutation-invariant. This assumption has historically underpinned all universality proofs, which argue that transformers equipped with chain-of-thought capabilities can execute arbitrary computations—thereby rendering them Turing complete—only by leveraging positional data. We challenge this perspective by examining the context of long-form reasoning, where generation occurs via a finite sliding window. We posit that the sliding mechanism itself, albeit subtly, disrupts this permutation symmetry. To quantify this enhanced expressive power, we introduce the HIST model, an abstract autoregressive framework where updates are driven solely by a constant-sized internal state and the histogram of token counts within the active window. We demonstrate that the HIST model is Turing complete; specifically, the window’s evolution allows for the identification of the token that has just exited the window, a capability sufficient to simulate Turing-complete Post machines. Furthermore, we develop a sliding-window transformer operating over a fixed-size token alphabet that functions without PE and can effectively simulate the HIST model. These findings indicate that positional encodings are not a prerequisite for universal computation in transformers, as the sliding window inherently breaks permutation symmetry and provides adequate positional information.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





