Online Packet Scheduling with Deadlines and Learning
Title: Learning-Based Online Packet Scheduling with Deadline Constraints
Abstract:
Network routers tasked with upholding Quality-of-Service (QoS) mandates face the continuous challenge of selecting which expiring data packets to transmit at each clock cycle, particularly when the utility of a packet remains hidden until it has been processed. This study addresses the Online Packet Scheduling with Deadlines (OPSD) problem within a Partial Feedback framework. In this model, packets arrive sequentially with varying deadlines, yet their associated weights are revealed only post-execution. Assuming a stochastic distribution for these unknown weights, we investigate several variants of the OPSD problem characterized by bandit feedback.
By linking our framework to the sleeping bandits problem, we define our primary learning objective as $\alpha$-regret minimization. We present algorithms that offer provable $\alpha$-regret bounds across different slackness spans, covering both systems that permit randomization and those that operate deterministically. In all examined scenarios, our proposed methods achieve an $\alpha$-regret upper bound of $\widetilde{\mathcal{O}}\left(\sqrt{KT}\right)$, which aligns with the theoretical lower bound for standard bandit settings.
Furthermore, in the practically significant context of 2-bounded deadline instances—where a packet’s deadline is at most one clock cycle from its arrival—our deterministic algorithm secures the tightest possible competitive ratio. Notably, for cases involving a finite number of distinct packet types ($K \ge 2$), we demonstrate the ability to surpass the longstanding competitive ratio barrier of $\Phi = \frac{1+\sqrt{5}}{2}$. Instead, the system can achieve a tighter competitive ratio, denoted as $\theta_K$, which falls within the interval $[\sqrt{2}, \Phi)$.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





