Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
Title: Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
Abstract:
This study investigates the behavioral alignment and representation dynamics of large language model (LLM) agents operating within financial decision-making contexts. We introduce TradeArena, a comprehensive and auditable testbed for trading agents that features risk reporting, execution simulation, memory capabilities, and replayable trajectories. This platform enables a detailed analysis of how rationales, market positions, and interventions shift under conditions of market stress. The associated code and data artifacts are accessible via the TradeArena repository at https://github.com/weich97/TradeArena.git.
Our analysis identifies specific pre-failure signatures in agent behavior. Specifically, planning embeddings drift away from their normal centroids, fused plan-risk representations distinguish between normal states and those preceding a drawdown, and local manifolds demonstrate a contraction in effective rank. This pattern remains consistent across eight LLM trajectories and 80 rolling failure anchors, as observed through hash, LSA, Transformer, and white-box hidden-state probes.
Stress tests involving CoT-free target weights, lexical controls, OHLCV noise, and false audits indicate that while rationale-level contraction disappears when rationales are removed, intent-space and fused signatures retain their informativeness. Structured risk feedback serves as an external alignment signal without requiring fine-tuning; however, it does not universally boost performance. Specifically, true audit feedback enhances calibration for certain models and improves returns for others. Furthermore, these tests expose scenarios where placebo or hidden feedback yields higher short-horizon returns despite exhibiting weaker alignment diagnostics.
In a 51-stock intraday experiment, we identified a correlation blind spot: LLM rationales often justify exposure to coupled assets that the risk layer subsequently clips. Finally, a financial-audit task suite reframes the evaluation criteria, shifting the focus from determining "which model trades best" to assessing whether models can audit trajectories, adhere to execution boundaries, reproduce artifacts, and prevent claim overreach. These findings support a research-oriented conclusion rather than a profitability guarantee: auditable risk feedback and representation trajectories provide critical insights into whether LLM financial reasoning is aligning, drifting, or failing.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





