Global News Digest

arXiv

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

Title: MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

Abstract:

Developing language model agents capable of engaging in multi-agent strategic interactions is fundamentally challenged by the fact that the value of a given action often hinges on subsequent events that do not occur, rule-breaking moves, or choices made by competing players. Conventional reinforcement learning paradigms rely on the premise that rewards are assignable at every step; however, this premise breaks down in complex environments where outcomes are deeply intertwined across time and multiple agents. To address this, we present a novel framework featuring delayed per-step reward attribution integrated with eligibility gating. This system operates through an episode lifecycle and postprocessing pipeline that calculates rewards exclusively at the conclusion of an episode. It then traces these rewards backward to their specific originating steps based on task-specific semantics, while simultaneously filtering out any steps that lack the necessary dependent information for valid training.

When combined with asynchronous rollout generation utilizing vLLM’s continuous batching, curriculum-based opponent sampling, and multi-level stratified batch construction, this methodology facilitates stable and sample-efficient reinforcement learning within multi-agent settings. We tested this approach on the MindGames Arena benchmark at NeurIPS 2025. Our results demonstrate that a single 8-billion-parameter open-source model, trained using our method, either matched or exceeded the performance of significantly larger proprietary systems, such as GPT-5, in direct head-to-head competitions. Furthermore, this model secured first place in both the Open (unrestricted) and Efficient (limited to ≤8B parameters) tracks of the competition.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.