arXiv

Bayesian learning for the stochastic shortest path problem

June 4, 2026 · Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo · Original Source

Title: Applying Bayesian Learning to the Stochastic Shortest Path Problem

Abstract: Markov decision processes (MDPs) are frequently utilized to model sequential decision-making scenarios. This study concentrates on the stochastic shortest path (SSP) problem, defined as an infinite-horizon, undiscounted MDP featuring absorbing terminal states. We introduce a Bayesian framework designed to acquire optimal decision strategies via interaction with the task at hand. While our method aims to learn the optimal action-value function, denoted as $Q^$, it distinguishes itself from numerous existing Bayesian techniques by avoiding unrealistic modeling assumptions and ad-hoc approximations. Instead, we directly derive posterior beliefs for $Q^$ using Bellman’s optimality equations. In cases involving deterministic rewards, we characterize the posterior as a distribution possessing a manifold density. To enable more straightforward inference, we relax the likelihood function to ensure the existence of a Lebesgue density. However, this relaxation introduces unidentifiability challenges; specifically, the relaxed posterior may assign considerable probability mass to improper decision rules, whereas the exact posterior does not. Additionally, we compute the exact posterior probabilities for selecting optimal actions under a tabular parameterization of $Q^*$, employing a Gaussian likelihood relaxation and a Gaussian prior, a capability valuable for benchmarking purposes. Numerical experiments conducted on variations of the Deep Sea benchmark corroborate our theoretical findings. Our results indicate that the proposed framework accurately quantifies uncertainty and exhibits greater data efficiency compared to other temporal-difference-based Bayesian methodologies. The paper concludes with suggestions for subsequent research directions.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC