Cross-Entropy Games and Frost Training
Title: Cross-Entropy Games and Frost Training
Abstract: This study introduces Frost Training, a novel approach designed to enhance Monte Carlo-based policy optimization within a broad category of LLM-as-a-judge tasks known as Cross-Entropy Games. At its core, the method leverages the gradient of the reward function within embedding space. While this specific signal is traditionally employed in the Greedy Coordinate Gradient (GCG) jailbreaking technique, we provide the first evidence that it can also be utilized to accelerate model training. We evaluate our approach using GRPO training for maximum-likelihood infilling. Our results show that Frost Training significantly boosts the model’s capacity to produce high-scoring outputs, achieving superior maximum scores in a best-of-k framework while simultaneously increasing computational efficiency.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




