arXiv

Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

June 4, 2026 · Federico Zucchi, Yi Xie, Chao Zhang, Keyuan Luo, Thomas Lampert, Ziyue Li · Original Source

Title: The Complexity of Adaptive Patching in Time-Series Forecasting

Abstract: Adaptive patching has emerged as a promising strategy for time-series Transformers, suggesting that allocating finer patches to locally informative segments of a sequence can enhance performance. This study investigates the specific conditions under which a content-adaptive patching operator can surpass a carefully tuned uniform patching approach. We demonstrate that local heterogeneity in isolation is insufficient; specifically, regions that appear complex do not necessarily yield lower losses through finer patching when evaluated using pointwise forecasting metrics. By framing patching as a problem of budgeted bitrate allocation, we derive a precise threshold that any dynamic patching rule must exceed to outperform a well-calibrated uniform baseline. Furthermore, we establish bounds on potential improvements, utilizing a quadratic surrogate for local gains and a strong-convexity bound for global performance under the model’s assumptions.

Our analysis yields two key structural insights: first, in the absence of a coupling constraint, scalar measures of local complexity cannot generate a non-uniform optimum within a common loss landscape; second, once the backbone model is optimized for representation awareness, the advantage derived from alignment diminishes significantly around an optimally tuned uniform patch size. To validate these theoretical predictions, we conducted a controlled isolation study across three representative architectures. In this experiment, we substituted each adaptive mechanism with a uniform patch-size sweep while maintaining the backbone, data, and training protocols unchanged. Results from standard long-horizon forecasting benchmarks indicate that the validation-selected uniform baseline remains competitive with its dynamic counterpart. Aggregated by dataset, there is no consistent directional advantage, with per-setting effects clustering around zero. While some gains are observed, they are specific to particular methods and datasets. Consequently, adaptive patching should be rigorously evaluated against a tuned uniform baseline, as its true value hinges on the ability of a cost-effective and reliable routing signal to pinpoint locations where finer patches genuinely reduce forecasting error.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC