arXiv

GIFT: Games as Informal Training for Generalizable LLMs

Title: GIFT: Leveraging Games as Informal Training for Generalizable LLMs

Abstract:

While Large Language Models (LLMs) have demonstrated exceptional proficiency in structured domains like code generation and mathematical reasoning, they continue to face challenges in broader competencies such as social intelligence, creativity, and planning. Drawing inspiration from human cognitive development—where both formal instruction and informal experiences contribute to intelligence—we propose integrating informal learning into LLM training frameworks. We utilize games as environments that provide feedback-driven signals without the need for manual annotation.

To encompass a wide spectrum of capabilities, including abstract reasoning, strategic planning, creative output, and social interaction, our approach merges traditional formal math tasks with three distinct game-based scenarios: Matrix Games, TicTacToe, and Who's the Spy. However, applying a unified Reinforcement Learning (RL) objective to this mixed dataset can obscure task-specific learning signals and lacks explicit mechanisms for coordinating gradients across different tasks.

To address these limitations, we introduce Coordinated Subtask Training (CST). This method substitutes the single, mixed update step with sequential, subtask-specific updates. This strategy effectively isolates heterogeneous RL signals while implicitly fostering coordination among the various subtasks. Our experiments on ability-oriented benchmarks reveal that informal learning through games boosts generalization capabilities beyond what formal training achieves in isolation. Furthermore, CST significantly enhances multi-task RL by maintaining strong in-domain performance on subtasks while simultaneously elevating broader general abilities. The associated code and data have been made publicly available.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...