Global News Digest

arXiv

Test-Time Deep Thinking to Explore Implicit Rules

Title: Leveraging Test-Time Deep Thinking to Uncover Implicit Rules

Abstract: As Large Language Models (LLMs) continue to evolve, intelligent agents are gaining increasing prominence. Nevertheless, these agents frequently struggle in environments defined by implicit rules—unseen constraints that cannot be directly observed and must instead be deduced through interaction. Such difficulties often trap agents in cyclical trial-and-error patterns, resulting in task failure. To tackle this issue, we present TTExplore, a framework in which a "thinker" module examines interaction histories to deduce these hidden rules and direct an "actor." Success in this context relies heavily on the thinker's reasoning capabilities. However, assessing deep reasoning paths is inherently unstable and challenging, creating a significant barrier to effective training. We address this problem by introducing a new, stable reinforcement learning pipeline. The fundamental concept involves utilizing precise task-level scores as indirect rewards, thereby sidestepping the complexity of evaluating intermediate reasoning steps. Additionally, we limit each trajectory to a single thinking node to mitigate reward sparsity. Leveraging this approach, we trained a dedicated 7B model, Exp-Thinker. Evaluations across five text-based embodied tasks reveal that TTExplore, when paired with Exp-Thinker, enhances baseline agent performance by an average of 14–19 points, highlighting the efficacy of explicitly reasoning about implicit rules.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.