Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Title: Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Abstract: This paper presents the inaugural variance-aware algorithms for contextual dueling bandits, utilizing neural networks to approximate nonlinear utilities alongside shallow exploration techniques. A primary theoretical hurdle in prior research was the lack of a closed-form estimator, necessitating an excessively wide network with a width of $m = \widetilde{\Omega}(T^{14})$. To overcome this limitation, we employ a new analytical framework that integrates spectral analysis with iterative self-improvement. This approach lowers the network width requirement to $m = \widetilde{\Omega}(T^{6})$ and demonstrates that our methods attain a sublinear regret bound of $\widetilde{\mathcal{O}}(d\sqrt{\sum_{t=1}^{T} \sigma_t^2} + \sqrt{dT})$ within both Upper Confidence Bound (UCB) and Thompson Sampling (TS) contexts. Experimental findings confirm that the proposed algorithms deliver state-of-the-art results on both synthetic and real-world benchmarks, while maintaining computational efficiency and exhibiting sublinear regret in practical applications.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC


