Global News Digest

arXiv

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Title: Pairwise Preference-Based Reinforcement Learning for Extended Time Horizons

Abstract:

Standard reinforcement learning frameworks generally aim to maximize the anticipated value of a scalar reward function. However, specifying goals through pairwise preferences is often more intuitive for users and can capture objectives that scalar rewards fail to represent. Consequently, there has been increasing attention on reinforcement learning techniques that utilize pairwise preferences. Despite this interest, existing approaches suffer from inefficiency in scenarios involving long time horizons. Furthermore, they do not provide performance guarantees comparing Markov policies to history-dependent policies, leaving a gap between theoretical foundations and practical applications.

To address these limitations, we introduce the Markov decision contest, a novel problem model designed for reinforcement learning with pairwise preferences. We demonstrate that stationary Markov policies are optimal across all history-dependent policies. Additionally, we establish that solving a Markov decision contest exactly is computationally tractable (in P) and that a straightforward iterative algorithm converges to an optimal policy at a sublinear rate. Finally, empirical evaluations on high-dimensional decision problems with extended time horizons reveal that our approximate algorithm achieves significantly higher learning efficiency compared to previous methods.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.