arXiv

Bayesian learning for the stochastic shortest path problem

Title: Applying Bayesian Learning to the Stochastic Shortest Path Problem

Abstract: Markov decision processes (MDPs) are frequently utilized to model sequential decision-making scenarios. This study concentrates on the stochastic shortest path (SSP) problem, defined as an infinite-horizon, undiscounted MDP featuring absorbing terminal states. We introduce a Bayesian framework designed to acquire optimal decision strategies via interaction with the task at hand. While our method aims to learn the optimal action-value function, denoted as $Q^$, it distinguishes itself from numerous existing Bayesian techniques by avoiding unrealistic modeling assumptions and ad-hoc approximations. Instead, we directly derive posterior beliefs for $Q^$ using Bellman’s optimality equations. In cases involving deterministic rewards, we characterize the posterior as a distribution possessing a manifold density. To enable more straightforward inference, we relax the likelihood function to ensure the existence of a Lebesgue density. However, this relaxation introduces unidentifiability challenges; specifically, the relaxed posterior may assign considerable probability mass to improper decision rules, whereas the exact posterior does not. Additionally, we compute the exact posterior probabilities for selecting optimal actions under a tabular parameterization of $Q^*$, employing a Gaussian likelihood relaxation and a Gaussian prior, a capability valuable for benchmarking purposes. Numerical experiments conducted on variations of the Deep Sea benchmark corroborate our theoretical findings. Our results indicate that the proposed framework accurately quantifies uncertainty and exhibits greater data efficiency compared to other temporal-difference-based Bayesian methodologies. The paper concludes with suggestions for subsequent research directions.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...

Who is Elon Musk and what is his net worth?
BBC News

Who is Elon Musk and what is his net worth?

Elon Musk, CEO of Tesla and SpaceX, became the first person to surpass a $500 billion net worth in October 2025. His wea...

AI Boom Propels China Optical Maker to Top Weighting on CSI 300
Bloomberg

AI Boom Propels China Optical Maker to Top Weighting on CSI 300

Driven by surging AI demand, a Chinese optical maker has reached the highest weighting in the CSI 300 index.

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)
Bloomberg

AI Bubble 'Something to Look At,' BNP's Huynh Says (Video)

BNP Paribas’ Huynh describes the AI bubble as “something to look at,” signaling cautious interest in the sector’s potent...

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million
Bloomberg

SoftBank’s PayPay to Buy T&D’s Life Insurer for $840 Million

PayPay is acquiring T&D Holdings’ life insurer for $840 million, shortly after its historic $879.8 million Nasdaq IPO.

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots
Bloomberg

Goldman Sachs CEO David Solomon on Running a Bank in the Age of AI | Odd Lots

Goldman Sachs CEO David Solomon discusses integrating AI into banking operations. He explores how artificial intelligenc...