ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
Title: ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
Original: arXiv:2606.03017v1 Announce Type: cross Abstract: Reward transfer in Inverse Reinforcement Learning (IRL) is unreliable when policies must generalize to unseen combinations of environment dynamics and task goals. We propose Factorized Contrastive Abstractions for Transferable IRL (ConTraIRL), a framework that enables compositional reward transfer by learning decoupled latent representations of these two factors. ConTraIRL uses a dual-encoder architecture that maps observations into separate dynamics and goal latent spaces, trained with a dual contrastive objective. Temporal alignment encourages the dynamics encoder to learn goal-invariant structure, while the goal encoder captures dynamics-invariant features. This factorization supports reward inference under recombined dynamics-goal settings. Experiments on continuous control benchmarks demonstrate effective few-shot transfer to unseen dynamics-goal pairings, improving sample efficiency and reward recovery over transfer IRL baselines.
Rewritten: The reliability of reward transfer within Inverse Reinforcement Learning (IRL) diminishes significantly when policies are required to generalize across novel combinations of environmental dynamics and task objectives. To address this challenge, we introduce ConTraIRL (Factorized Contrastive Abstractions for Transferable IRL), a novel framework designed to facilitate compositional reward transfer. This approach operates by learning distinct, decoupled latent representations for the two underlying factors.
ConTraIRL employs a dual-encoder structure that projects observations into independent latent spaces dedicated to dynamics and goals, respectively. The model is optimized using a dual contrastive objective. Specifically, temporal alignment mechanisms guide the dynamics encoder to capture structures that remain invariant to goal variations, whereas the goal encoder focuses on extracting features that are independent of the dynamics. This strategic factorization allows for robust reward inference even when dynamics and goals are recombined in new configurations.
Evaluations on continuous control benchmarks indicate that ConTraIRL achieves effective few-shot transfer to previously unseen dynamics-goal pairings. The results show enhanced sample efficiency and superior reward recovery capabilities compared to existing transfer IRL baseline methods.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



