Position: Current Benchmarking Hinders Real Progress in Deep Learning for Time Series Forecasting
Title: Current Benchmarking Standards Stifle Genuine Advancement in Deep Learning for Time Series Prediction
Abstract: Although deep learning has gained significant traction in time series applications, the surge in newly proposed architectures and frequently conflicting experimental outcomes complicate efforts to determine which specific design choices or model components are responsible for performance gains. In this position paper, we contend that existing benchmarking methodologies are inadequate for isolating the true drivers of performance disparities, thereby impeding progress within the discipline. Specifically, critical variations in design dimensions are frequently ignored during architectural comparisons, resulting in inconsistent findings. To substantiate this argument, we demonstrate that such variations—commonly dismissed as trivial implementation details—can exert a more substantial influence on results than the selection of specific sequence modeling layers. We further examine how neglected factors, such as global versus local characteristics, can (1) alter the fundamental classification of the forecasting approach and (2) significantly skew empirical outcomes. These insights indicate a need to overhaul benchmarking protocols and prioritize the core elements of the forecasting challenge when constructing and evaluating architectures. As a practical initiative, we introduce an auxiliary model card for forecasting: a standardized template featuring specific fields designed to document the key design decisions underlying both established and emerging forecasting models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





