The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail
Title: The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail
Abstract: Brain foundation models (BFMs), which are self-supervised Transformers trained on functional magnetic resonance imaging (fMRI) data, are theoretically capable of extracting individual cognitive performance metrics from these signals. However, our evaluation reveals a significant shortfall: across three leading BFM architectures and various readout methods, their predictive accuracy for cognition falls below that of a simple linear regression model utilizing only the ~80K parameters of the functional connectivity (FC) matrix. This performance deficit intensifies as model scale increases; for instance, BrainLM’s larger 650M-parameter model performs worse than its smaller 111M-parameter counterpart.
We identify this discrepancy as a variance allocation problem. While BFM pretraining effectively captures the dominant variance components within fMRI data, it fails to retain the higher-order structural information essential for predicting cognition. Our analysis of reconstructed signals via per-cumulant examination indicates that although second-order covariance is partially maintained, the third-order co-skewness tensor is substantially erased.
To address this loss, we developed a linear pipeline that projects fMRI signals into a subspace optimized for preserving co-skewness and derives FC within this space. This approach surpasses both raw FC and all tested pretrained BFMs across every dataset and parcellation scheme examined, achieving superior performance to previous state-of-the-art methods under controlled conditions, despite requiring no pretraining and no GPU. Furthermore, we demonstrate that the primary limitation lies in the pretraining objective rather than model architecture or size, as we restore the raw-FC performance ceiling on BrainLM’s forward pass by fine-tuning with a loss function specifically targeted at this co-skewness-preserving subspace.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




