arXiv

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

June 2, 2026 · Qian Li, Xinyu Mao, Shang-Hua Teng · Original Source

Title: Reevaluating Positional Encoding: Sliding-Window Transformers Without PE Are Turing Complete

Abstract: The prevailing consensus holds that positional encoding (PE) is essential for transformers to handle ordered sequences, as next-token prediction without PE seems to treat context tokens as permutation-invariant. This assumption has historically underpinned all universality proofs, which argue that transformers equipped with chain-of-thought capabilities can execute arbitrary computations—thereby rendering them Turing complete—only by leveraging positional data. We challenge this perspective by examining the context of long-form reasoning, where generation occurs via a finite sliding window. We posit that the sliding mechanism itself, albeit subtly, disrupts this permutation symmetry. To quantify this enhanced expressive power, we introduce the HIST model, an abstract autoregressive framework where updates are driven solely by a constant-sized internal state and the histogram of token counts within the active window. We demonstrate that the HIST model is Turing complete; specifically, the window’s evolution allows for the identification of the token that has just exited the window, a capability sufficient to simulate Turing-complete Post machines. Furthermore, we develop a sliding-window transformer operating over a fixed-size token alphabet that functions without PE and can effectively simulate the HIST model. These findings indicate that positional encodings are not a prerequisite for universal computation in transformers, as the sliding window inherently breaks permutation symmetry and provides adequate positional information.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC