Randomized Least Squares Value Iteration itself is Joint Differentially Private
Title: Randomized Least Squares Value Iteration is Inherently Joint Differentially Private
Abstract:
As reinforcement learning (RL) is increasingly deployed in sensitive sectors like healthcare and recommendation engines, the implementation of privacy-preserving methods has become critical for safeguarding user data. This study examines privacy-aware RL within an episodic framework, concentrating on algorithms that utilize randomized exploration, such as Randomized Least Squares Value Iteration (RLSVI). The primary objective is to elucidate the interplay between the inherent randomness in exploration and the noise necessary for privacy guarantees. We present a novel privacy analysis demonstrating that the noise parameters in RLSVI, originally intended for exploration, concurrently ensure privacy. Specifically, we establish that RLSVI achieves $(\varepsilon(\delta),\delta)$-joint differential privacy in tabular Markov Decision Processes (MDPs), with the privacy budget defined as $\varepsilon(\delta) = \frac{2AK}{H^2\log(2HSA)} + 2\sqrt{\frac{2AK\log(1/\delta)}{H^2\log(2HSA)}}$. In this formulation, $S$ and $A$ denote the counts of states and actions, respectively; $H$ represents the episode length; and $K$ signifies the total number of episodes.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





