How Much of a Model Do We Need? Redundancy and Slimmability in Remote Sensing Foundation Models
Title: Assessing the Necessity of Model Size: Examining Redundancy and Slimmability in Remote Sensing Foundation Models
Abstract
While large-scale foundation models (FMs) in remote sensing (RS)—referred to herein as RS FMs—are built upon architectural paradigms pioneered in computer vision (CV), the extent to which CV scaling laws apply to the RS domain remains unverified. We propose that RS FMs reach an overparameterized state at significantly lower scales compared to their CV equivalents, with information pertinent to specific tasks being encoded redundantly throughout the model’s dimensions. To investigate this, we utilize post-hoc slimmability—specifically, a uniform reduction in the width of pretrained encoder transformer blocks—to quantify representational redundancy across eight leading RS FMs. Our evaluation covers classification, segmentation, and change detection tasks.
The results indicate that RS FMs maintain between 69% and 109% of their original relative accuracy on RS datasets despite aggressive width reduction. In contrast, models pretrained on natural images, such as the Masked Autoencoder (MAE) and DINOv2 (hereafter CV MAE and CV DINOv2), exhibit sharp performance declines on ImageNet subsets with matching class counts under similar computational constraints. Although CV MAE performs better when evaluated directly on RS datasets, it fails to bridge the performance gap, suggesting that both the nature of the datasets and domain-specific pretraining methodologies drive the divergence between these model types.
Further mechanistic analyses, including assessments of feature correlation, explained variance, and effective dimensionality, reveal that variance critical to tasks is concentrated within a small number of principal components and is redundantly stored across various model dimensions. Additionally, we demonstrate that slimmable training enhances performance over post-hoc slimmability for contrastive learning objectives, whereas reconstruction-based objectives do not see improvements under current slimmable training protocols. These findings position post-hoc slimming as both a viable deployment strategy for resource-limited RS applications and a valuable diagnostic instrument for identifying representational redundancy in RS FMs. All code will be released upon acceptance.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





