Building The Ph(ysical)AI Layer Of Machine Intelligence
Constructing the Ph(ysical)AI Layer of Machine Intelligence
Source: arXiv:2606.04106v1
While foundation models typically rely on massive-scale training across varied datasets to achieve generalization, they often struggle to transfer knowledge to completely unseen domains when paired training data is unavailable. To address this, we introduce principle-driven foundation models that embed signal-theoretic fundamentals—such as symmetry, energy conservation, and Fourier decomposition—rather than relying on unanchored statistical correlations.
Our central hypothesis posits that differences between domains are not rooted in distinct fundamental physics, but rather in learnable transformations affecting time, frequency, magnitude, or phase. By training exclusively on radio-frequency (RF) data and utilizing a co-designed architecture and loss functions that integrate these principles, we demonstrate the ability to transfer knowledge across modalities, including audio, images, text, and video. This cross-modal transfer is achieved using only frozen representations derived from RF data, eliminating the need for fine-tuning the encoder on target domains.
Our model features a frozen encoder with 1.99 million parameters. Through linear probing, it attained an average accuracy of 77.7% (with a top-3 accuracy of 91.9%) across 15 diverse tasks. Performance varied systematically: physically grounded tasks, such as seismology, speaker recognition, and RF fingerprinting, yielded an accuracy of 84.5%, whereas semantic tasks, including language recognition and music genre classification, reached 70.0%. These results suggest that scale-driven and principle-driven methodologies are complementary; physical principles facilitate efficient cross-modal transfer while inherently defining the boundary between physical and semantic comprehension.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





