Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning
Title: Dynamic Exits for LLMs via Reinforcement Learning: An Experience-Based Approach
Abstract: Large Language Models are currently constrained by the sluggish pace of autoregressive inference. Although self-speculative decoding offers a pathway to acceleration, its performance is often limited by rigid parameters, such as fixed speculation lengths and predetermined exit layers. To address these inefficiencies, we formulate the optimization problem as a Markov Decision Process and introduce LEDE, a framework grounded in offline reinforcement learning. LEDE employs a learned policy to dynamically determine the most effective exit layer and speculation length at each generation step, guided by the local context of the sequence. This approach effectively balances the trade-off between computational expenditure and the quality of the drafted tokens. Extensive testing on Llama-2 and Llama-3 architectures demonstrates that LEDE delivers speedups ranging from $2.0\times$ to $2.7\times$ compared to standard autoregressive decoding, while also outperforming static speculative baselines by an additional 17%.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





