arXiv

Coherent Swap Regret and Channel-Proof Learning

June 3, 2026 · Sohail Sarkar · Original Source

Title: Coherent Swap Regret and Channel-Proof Learning

Abstract:

Standard external regret guarantees stability solely against substituting one’s actions with a static alternative. However, in the context of quantum games, this metric overlooks a fundamental physical operation: a participant may apply a local completely positive trace-preserving (CPTP) map to the quantum state they have either received or prepared. To address this gap, we propose coherent swap regret as a benchmark that accounts for all such local CPTP deviations. We present an algorithm that attains $O(\sqrt{dT\log d})$ coherent swap regret by employing entropic mirror ascent on the CPTP Choi slice, combined with a fixed-point play rule.

Our primary contribution is the characterization of a three-tier landscape of deviation classes. Replacement channels align with standard external regret, yielding a rate of $\Theta(\sqrt{T\log d})$. Unital channels, which encompass unitary operations and their mixtures, exhibit zero minimax regret. Conversely, deterministic measurement-and-preparation channels impose a lower bound of $\Omega(\sqrt{dT\log d})$ in the moderate-horizon setting, a rate that proves sufficient for handling general CPTP deviations. This analysis suggests that the computational hardness stems from the non-unital utilization of the recommendation register rather than from quantum coherence itself.

In terms of application, decentralized full-information learning in finite quantum games converges to an $\varepsilon$-approximate separable quantum correlated equilibrium within $T=O(\max_i d_i\log d_i/\varepsilon^2)$ rounds. We link these equilibria to the channel-proofness of mediated quantum recommendation protocols. Furthermore, we provide a semidefinite programming (SDP) audit to detect local CPTP exploitability for any finite-dimensional state. Finally, we extend our framework to a probing-bandit scenario, achieving a pseudo-regret of $O(d^{4/3}T^{2/3}(\log d)^{1/3})$ when utilizing Haar-random pure-state probes.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC