arXiv

Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

June 3, 2026 · Donghwan Lee · Original Source

Title: Stabilizing Linear Q-Learning via Periodic and Soft Target Updates

Abstract:

While periodic target updates in Q-learning and soft target updates in actor-critic frameworks are widely recognized as effective empirical stabilization techniques, their underlying theoretical foundations remain partially understood. This study provides a rigorous and exact theoretical examination of these mechanisms within the context of linear Q-learning, which employs linear function approximation. Our analysis leverages the precise dynamics of switched linear systems (SLS) generated by the Bellman maximum operator, focusing on the joint spectral radius (JSR) of the associated switching matrix families.

Although linear Q-learning does not always guarantee convergence, we demonstrate that both periodic hard target updates and soft target updates can ensure convergence to the exact projected Q-Bellman solution, provided that explicit conditions regarding the step size and spectral properties are met. The primary investigation centers on deterministic linear Q-learning, as the mechanics of target updates are most clearly observable in this setting. Once a JSR certificate is established for the mean recursion in the deterministic case, the analysis extends to the stochastic reinforcement learning environment by substituting deterministic modes with sampled stochastic modes and incorporating an appropriate stochastic noise analysis.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC