UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures
Title: UR-JEPA: Leveraging Uniform Rectifiability to Regularize Joint-Embedding Predictive Architectures
Abstract
A primary challenge in the training of Joint-Embedding Predictive Architectures (JEPAs) is the prevention of representation collapse. The LeJEPA approach mitigates this issue by applying Sketched Isotropic Gaussian Regularization (SIGReg), which imposes an isotropic Gaussian target on the embeddings. However, this strategy conflicts with the manifold hypothesis, a principle suggesting that embeddings should cluster within a low-dimensional subset of the ambient space. To address this discrepancy, we introduce UR-JEPA, a method that aims for a uniformly $n$-rectifiable measure of local tangent dimension $n$ at small scales. This is achieved using a Gaussian-kernel smoothed Carleson-type square function, denoted as $\mathcal{L}^{\text{CGLT}}$, alongside a complementary Jones $\beta$-number formulation.
Empirical evaluations on the Inet10 dataset demonstrate that UR-JEPA($\mathcal{L}^{\text{CGLT}}$) achieves a score of $0.9141 \pm 0.0014$, representing a $+0.83$ percentage point improvement over LeJEPA($\mathcal{L}^{\text{SIGReg}}$) while exhibiting a seed standard deviation that is approximately $30\%$ lower. In tests on matched-recipe Galaxy10~SDSS, a single-seed ImageNet-$100$ configuration, and a three-seed EuroSAT remote-sensing setup, both methods converge within the same peak-accuracy range. Notably, UR-JEPA maintains its advantage in lower seed variance. On the EuroSAT dataset, the in-domain performance is highly competitive, reaching $96.0\%$ versus $96.1\%$, despite UR-JEPA utilizing a backbone that is $25\times$ smaller while still achieving significant transfer capabilities in remote-sensing foundation models.
The key distinction between the two approaches is geometric. Visualizations of the projector output distribution reveal that for all four datasets, UR-JEPA($\mathcal{L}^{\text{CGLT}}$) generates a global PCA spectrum characterized by a sharp decline of four to five orders of magnitude at indices approximately $20$ to $25$ out of $D = 32$. In contrast, the spectrum produced by LeJEPA remains nearly flat, with a top-to-bottom ratio of no more than $3.6$. While both methods yield per-dimension marginals that are simultaneously near-Gaussian—with mean Shapiro-Wilk $W$ values ranging from $0.992$ to $0.996$, consistent with Diaconis-Freedman results—this similarity masks fundamental structural differences in the projected representations when accuracy is matched.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




