Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints
Title: Retrieval-Aligned Tabular Foundation Models Facilitate Resilient Clinical Risk Assessment in Electronic Health Records Within Real-World Limitations
Abstract: Deriving clinical predictions from structured electronic health records (EHRs) presents significant hurdles, including high dimensionality, data heterogeneity, class imbalance, and distribution shifts. Although tabular in-context learning (TICL) and retrieval-augmented approaches demonstrate strong performance on standard benchmarks, their efficacy in actual clinical environments remains poorly understood. To address this, we introduce a multi-cohort EHR benchmark that evaluates classical, deep tabular, and TICL models across diverse conditions, including varying data volumes, feature dimensions, outcome rarity, and cross-cohort generalization capabilities. Our findings indicate that while PFN-based TICL models exhibit sample efficiency in low-data scenarios, their performance deteriorates when employing naive distance-based retrieval methods as data heterogeneity and imbalance escalate. To overcome these limitations, we propose AWARE, a task-aligned retrieval framework that leverages supervised embedding learning and lightweight adapters. AWARE achieves an AUPRC improvement of up to 12.2% under conditions of extreme imbalance, with performance gains becoming more pronounced as data complexity increases. Ultimately, our study highlights that retrieval quality and the alignment between retrieval and inference processes are the primary bottlenecks hindering the deployment of tabular in-context learning for clinical risk prediction.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




