arXiv

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

Title: Optimizing Language Model Alignment at Test Time via Importance Sampling in Pre-Logit Space

Abstract: The high computational expense associated with fine-tuning large language models (LLMs) has made test-time alignment an increasingly popular area of interest. Addressing this, we introduce Adaptive Importance Sampling on Pre-logits (AISP), a novel test-time reward-guided alignment technique rooted in sampling-based model predictive control with stochastic control inputs. AISP introduces Gaussian perturbations to pre-logits—specifically, the outputs of the penultimate layer—to maximize the expected reward relative to the perturbation’s mean. Our analysis shows that the optimal mean can be derived using importance sampling with sampled rewards. Experimental results indicate that AISP surpasses best-of-n sampling in terms of reward efficiency across varying sample counts and delivers superior rewards compared to other reward-based test-time alignment approaches.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...