arXiv

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

June 4, 2026 · Sekitoshi Kanai, Tsukasa Yoshida, Hiroshi Takahashi, Haru Kuroki, Kazumune Hashimoto · Original Source

Title: Optimizing Language Model Alignment at Test Time via Importance Sampling in Pre-Logit Space

Abstract: The high computational expense associated with fine-tuning large language models (LLMs) has made test-time alignment an increasingly popular area of interest. Addressing this, we introduce Adaptive Importance Sampling on Pre-logits (AISP), a novel test-time reward-guided alignment technique rooted in sampling-based model predictive control with stochastic control inputs. AISP introduces Gaussian perturbations to pre-logits—specifically, the outputs of the penultimate layer—to maximize the expected reward relative to the perturbation’s mean. Our analysis shows that the optimal mean can be derived using importance sampling with sampled rewards. Experimental results indicate that AISP surpasses best-of-n sampling in terms of reward efficiency across varying sample counts and delivers superior rewards compared to other reward-based test-time alignment approaches.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

June 4, 2026

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

June 4, 2026

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

June 4, 2026

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

June 4, 2026

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

June 4, 2026

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

June 4, 2026

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...

Global News Digest

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Oura Ring 5 review: Thinner, lighter, better

How AI has de-skilled translation

Zurich Insurance Expands Data-Center Offering Beyond the US

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role