arXiv

Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator

Title: Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator

Abstract: While policy-gradient techniques are a staple in reinforcement learning, practitioners frequently encounter training instability or stagnation as the optimization process advances. This study investigates this issue by analyzing the noise-to-signal ratio (NSR) of the policy-gradient estimator, which is calculated by dividing the estimator’s variance (noise) by the squared norm of the actual gradient (signal). Our primary findings demonstrate that for finite-horizon linear systems employing Gaussian policies with linear state-feedback, as well as finite-horizon polynomial systems using Gaussian policies with polynomial feedback, the NSR of the REINFORCE estimator can be precisely defined. This exact characterization is achievable either through closed-form expressions or numerical moment-evaluation algorithms, without relying on approximations. Furthermore, for broader scenarios involving general nonlinear dynamics and highly expressive policies, including those with neural network components, we establish a general upper bound for the variance. These analytical tools allow for a direct assessment of how the NSR fluctuates across different policy parameters and changes throughout optimization paths, such as those taken by SGD or Adam. Our experiments reveal that the NSR landscape is markedly non-uniform; it typically rises as the policy nears an optimal solution. In certain conditions, the NSR diverges, a phenomenon that can induce training instability and lead to policy collapse.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...