Global News Digest

arXiv

Agentic Transformers Provably Learn to Search via Reinforcement Learning

Title: Agentic Transformers Provably Learn to Search via Reinforcement Learning

Abstract: Tree search serves as a fundamental framework for numerous reasoning and decision-making tasks involving language agents, requiring entities to explore potential actions, retain records of unsuccessful attempts, and backtrack toward more viable options. Despite its prevalence, there is currently no theoretical framework explaining how transformer-based policies develop these search abilities through reinforcement learning (RL) training dynamics. To address this gap, we investigate a stochastic $k$-ary tree environment where an agentic transformer interacts solely through its trajectory history, receiving a terminal reward upon reaching a concealed leaf goal node.

We demonstrate that a two-head transformer can execute randomized depth-first search (DFS). In this architecture, one head monitors the sequence of prior actions, while the other identifies failure states to initiate backtracking. By analyzing policy gradient training dynamics under a depth-wise curriculum, we show that this DFS mechanism arises in distinct stages from sparse RL feedback, independent of expert demonstrations. The trained policy displays depth generalization, successfully navigating deeper full trees despite being trained exclusively on depth-$1$ and depth-$2$ structures. Additionally, we find that when goal distributions are imbalanced, applying return discounting yields a ranked DFS policy that favors branches with higher probabilities. Collectively, these findings reveal a mechanistic normal form for transformer-based search, where specialized attention heads collaborate to distill decision-relevant information from context and translate it into agentic actions through RL training.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.