arXiv

Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

June 4, 2026 · Viktor Vesel\'y, Aleksandar Todorov, Erwan Escudie, Matthia Sabatelli · Original Source

Title: Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

Abstract:

While temporal credit assignment is a fundamental component of both biological and artificial intelligence, its interplay with non-linear function approximation remains largely unexplored. In this study, we uncover a systematic failure mode in deep reinforcement learning (RL) known as Trace-Mediated Peak Bias (TMPB). Specifically, we find that at intermediate eligibility trace depths, agents exhibit irrational preferences for trajectories featuring high-magnitude reward "peaks," even when alternative paths offer superior cumulative returns. This phenomenon offers a mechanistic explanation for the Peak-End Rule, a cognitive bias in human memory wherein experiences are evaluated based on their most intense moments rather than their integrated utility.

Our analysis reveals that TMPB arises because eligibility traces amplify distal Temporal Difference errors into "gradient shocks." These shocks cannot be normalized by fixed-step-size Stochastic Gradient Descent, resulting in global overestimation of values. In contrast, adaptive optimizers alleviate this issue through second-moment normalization. These findings imply that human-like saliency distortions may naturally arise from the mathematical constraints inherent in credit assignment within distributed systems, highlighting adaptive optimization as a theoretical prerequisite for rational value estimation.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC