You Can Learn Tokenization End-to-End with Reinforcement Learning
Title: Mastering End-to-End Tokenization via Reinforcement Learning
Abstract:
Although the broader trajectory of Large Language Model (LLM) architectures is shifting toward increasingly end-to-end designs, tokenization persists as a static, hardcoded compression stage within the training pipeline. Previous research has demonstrated that integrating this compression step into the LLM architecture is viable at scale, utilizing heuristics to determine token boundaries. Other efforts have attempted to learn these boundaries by applying straight-through estimates, a method that reframes the discrete problem of identifying token boundaries as a continuous optimization task.
In this work, we propose an alternative approach: learning token boundaries through score function estimates. This method offers superior theoretical foundations, as it directly optimizes the discrete decision of where to place token boundaries to minimize loss. However, we find that applying reinforcement learning techniques, specifically time discounting, is essential to lower the variance of these score function estimates to levels that are practically manageable. Our results indicate that this new method surpasses previous straight-through estimation techniques, achieving both qualitative and quantitative improvements at the $100$ million parameter scale.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






