Test-time reward-guided alignment of language models by importance sampling on pre-logit space
Title: Optimizing Language Model Alignment at Test Time via Importance Sampling in Pre-Logit Space
Abstract: The high computational expense associated with fine-tuning large language models (LLMs) has made test-time alignment an increasingly popular area of interest. Addressing this, we introduce Adaptive Importance Sampling on Pre-logits (AISP), a novel test-time reward-guided alignment technique rooted in sampling-based model predictive control with stochastic control inputs. AISP introduces Gaussian perturbations to pre-logits—specifically, the outputs of the penultimate layer—to maximize the expected reward relative to the perturbation’s mean. Our analysis shows that the optimal mean can be derived using importance sampling with sampled rewards. Experimental results indicate that AISP surpasses best-of-n sampling in terms of reward efficiency across varying sample counts and delivers superior rewards compared to other reward-based test-time alignment approaches.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




