Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding
Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding
Abstract
Current transformer positional encoding methods, including sinusoidal and rotary (RoPE) approaches, operate under the assumption that every position is equally local. While these methods effectively encode token location, they fail to account for the variable range of positional influence. To address this limitation, we introduce Morlet Positional Encoding (MoPE), leveraging the Morlet wavelet’s ability to simultaneously minimize uncertainty in both position and frequency. In our framework, each embedding dimension autonomously learns its specific frequency and locality bandwidth from the data.
Theoretically, we demonstrate that MoPE serves as a unifying basis: standard sinusoidal PE and the RoPE correlation kernel are revealed to be limiting cases of MoPE when locality constraints are removed (i.e., as $\sigma_i \rightarrow \infty$). Specifically, the phase component of MoPE exactly recovers the rotation angle found in RoPE, while its amplitude introduces a learned Gaussian locality kernel—a feature absent in traditional encodings.
Empirically, integrating MoPE with Energy-Gated Attention yields a performance gain of +0.119 on the TinyShakespeare dataset compared to standard attention, surpassing the results achieved by either method individually. Further analysis of the learned parameters shows that all 128 frequency-bandwidth pairs converge toward the wavelet admissibility boundary. This empirical finding aligns with companion results regarding energy gating, pointing to a reproducible characteristic of character-level language signals that merits deeper investigation.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





