arXiv

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

June 2, 2026 · Julien Lafrance · Original Source

**Title: Cross-Modal Transfer Capabilities of Tabular Foundation Models: A Comprehensive Assessment Across 95 Datasets, 7 Modalities, and Two Operational Regimes

Abstract:

This study introduces a unified classification framework that integrates an Equiangular Tight Frame (ETF) preprocessing phase with a tabular foundation model for in-context inference. This approach is applied consistently across various data types, provided they are first converted into fixed vector representations. We benchmark this methodology using a comprehensive collection of 95 datasets that cover seven distinct signal modalities: vision, audio, speech, text, molecular structures, time-series data, and tabular formats.

The primary methodological innovation lies in standardizing the comparison baseline. Throughout our analysis, model performance is measured against the most effective lightweight tuned baseline utilizing the same frozen features. Results for oracle selection, deployed selection, and specialized fine-tuning are presented as separate metrics to maintain this rigorous comparison standard.

Our findings indicate that the proposed pipeline is highly competitive with strong lightweight tuned baselines operating on identical frozen features. While it does not surpass the absolute best specialized models or heavily optimized pipelines in every scenario, it remains closely aligned in performance while offering significant speed advantages. Specifically, the method typically executes 4 to 200 times faster than full backbone fine-tuning, often achieving comparable accuracy levels.

We provide practical guidance for implementing this pipeline, including instructions on when to utilize ETF preprocessing, how to terminate training without a validation split, how to configure the in-context classifier, and how to calibrate the output probabilities. This calibration is critical rather than merely cosmetic: while TabICL inherently generates well-calibrated probabilities, the initial ETF preprocessing disrupts this property. However, a post-hoc rescaling step successfully restores calibration, producing per-prediction confidence signals. These signals allow practitioners to establish trust thresholds for confidence-gated deployment. Additionally, we identify scenarios where the pipeline offers limited benefit and outline methods for detecting these cases in advance.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC