arXiv

NestRL: A Nested Training Regime for Mutual Adaptation in Human-AI Teaming

June 2, 2026 · Upasana Biswas, Durgesh Kalwar, Subbarao Kambhampati, Sarath Sreedharan · Original Source

Title: NestRL: Implementing a Nested Training Framework for Reciprocal Adaptation in Human-AI Collaboration

Abstract:

The capacity for mutual adaptation represents a fundamental hurdle in human-AI teaming, driven by the human tendency to modify their tactics based on an AI’s actions. Current methods often try to simulate human conduct by introducing a variety of training partners; yet, because these counterparts are generally fixed, they do not reflect the dynamic, responsive nature of human collaborators. Furthermore, when agents undergo joint training in conventional multi-agent environments, they frequently settle on coordination tactics that are opaque and effective solely with their specific co-trained partners, resulting in weak generalization capabilities.

To accurately represent adaptive human conduct, we conceptualize human-AI collaboration as an Interactive Partially Observable Markov Decision Process (I-POMDP). We introduce NestRL, a novel nested training framework designed to solve finite-level I-POMDPs. This approach trains agents at any given level against adaptive agents from the preceding level. This methodology ensures agents encounter adaptive behaviors without falling into the trap of developing partner-specific, opaque coordination habits. Our theoretical analysis demonstrates that NestRL prevents convergence toward strategies tailored exclusively to specific partners. We empirically substantiate these findings in the Overcooked environment, comparing against leading baseline models. The results indicate that NestRL delivers superior task outcomes when interacting with both novel adaptive agents and actual human teammates, while also demonstrating markedly improved adaptability throughout the duration of interactions.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC