Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation
Title: Tempora: Characterising the Time-Contingent Utility of Online Test-Time Adaptation
Abstract
Test-time adaptation (TTA) presents a robust solution for mitigating the performance degradation of machine learning (ML) models when faced with domain shifts, enhancing generalization capabilities in real-time using solely unlabelled data. While this adaptability is ideal for practical deployments, standard evaluation methodologies often rely on the unrealistic premise of unlimited processing time, thereby neglecting the critical trade-off between accuracy and latency. As ML systems become increasingly central to latency-sensitive, user-facing applications, temporal constraints significantly limit the effectiveness of adaptive inference; predictions that arrive too late to inform action are essentially useless.
To address this gap, we present Tempora, a framework designed to evaluate TTA under such temporal pressures. Tempora comprises three core components: temporal scenarios that simulate deployment constraints, evaluation protocols that standardize measurement, and time-contingent utility metrics that quantify the balance between accuracy and latency. We demonstrate the framework’s utility by implementing three distinct metrics: (1) discrete utility, tailored for asynchronous data streams with strict deadlines; (2) continuous utility, designed for interactive environments where value diminishes as latency increases; and (3) amortised utility, intended for deployments operating under budgetary constraints.
Through an extensive application of Tempora to 11 different TTA methods, we observe persistent rank instability across more than 750 evaluations involving varied datasets, model architectures, and hardware platforms. This indicates that conventional performance rankings fail to predict outcomes under temporal pressure. Furthermore, the method achieving the highest utility fluctuates depending on the specific domain shift and the level of temporal constraint, revealing no single dominant approach. By facilitating the first systematic evaluation of TTA across diverse temporal constraints, Tempora provides practitioners with a clearer perspective for method selection and offers researchers a concrete objective for developing more deployable adaptation techniques.
Code: https://github.com/sudotensor/tempora
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




