arXiv

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Title: Policy Enhancement via Success Conditioning: Resolving the Optimization Puzzle of Imitating Success

Abstract:

Success conditioning stands as a prevalent strategy for refining policies, a process that involves gathering trajectories, isolating those that reach a target outcome, and subsequently training the policy to mimic the actions executed during these successful runs. Although this concept is recognized under various labels—including rejection sampling with supervised fine-tuning (SFT), goal-conditioned reinforcement learning (RL), and Decision Transformers—the specific optimization problem it addresses has historically been ambiguous.

In this work, we demonstrate that success conditioning precisely resolves a trust-region optimization challenge. Specifically, it maximizes policy improvement while adhering to a $\chi^2$ divergence constraint, the radius of which is automatically calibrated by the dataset. This finding establishes a fundamental identity: at every state, the relative policy improvement, the magnitude of the policy change, and a novel metric we term "action-influence"—which quantifies how stochastic variations in action selection impact success probabilities—are mathematically equivalent. Consequently, success conditioning functions as a conservative improvement operator. Because exact success conditioning cannot deteriorate performance or trigger hazardous distribution shifts, its failure modes are transparent, characterized by minimal policy modification. Furthermore, we extend our theoretical framework to the widespread practice of return thresholding, illustrating that while this method can enhance improvement, it risks misalignment with the primary objective.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.
New York Times

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.

Space enthusiasts are the most eager for SpaceX’s IPO, driven by their passion for space exploration.

TechCrunch

Apple touts $1.4 trillion in App Store billings and sales, 90% without a commission

Apple reported $1.4 trillion in App Store billings for 2025, noting 90% were commission-free. Digital sales rose to $149...

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...