OpenRFM: Dissecting Relational In-Context Learning
Title: OpenRFM: Dissecting Relational In-Context Learning
Abstract: Relational Foundation Models (RFMs) aim to offer a unified pre-trained predictor capable of delivering immediate predictions for any relational database through a single forward pass, leveraging relational in-context learning (ICL). However, a significant performance divide exists between open-source RFMs and their commercial equivalents, and the root causes of this disparity have yet to be systematically analyzed. This study investigates a prominent framework, the Relational Transformer (RT), through two distinct lenses. On the model architecture side, we demonstrate that RT executes relation-level ICL; a kernel regression analysis reveals that this approach falters when sparse label-cell coverage results in an underdetermined regression problem. Regarding data, we examine the impact of RT’s pre-training sources, discovering that synthetic-only versus in-distribution pre-training pushes the identical architecture into different learning regimes—specifically, lazy learning versus feature learning. Our investigation into this gap identifies a critical missing component: a support-identifiable relational latent within the label-generation process. These insights lead to two key innovations: (1) a dual-stage ICL architecture that integrates a batch-level ICL layer, derived from a pre-trained tabular foundation model, with the relational backbone to mitigate relation-level label scarcity; and (2) a pre-training strategy combining homophily-aware synthetic data with continual real-data exposure, enhanced by prototype-based regularization. Collectively, these elements define OpenRFM, a straightforward yet highly effective RFM that boosts average task performance by roughly 30% compared to the RT backbone and outperforms the commercial KumoRFMv1 across a broad spectrum of evaluation tasks.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





