Exploiting Similarities in A/B Testing with Off-Policy Estimation
Title: Harnessing Overlaps in A/B Testing via Off-Policy Estimation
Abstract: A/B testing remains the conventional method for evaluating the performance improvements of a novel decision system compared to a control baseline. Conventional approaches typically view both systems as opaque entities, disregarding any potential commonalities. However, real-world scenarios rarely involve entirely distinct systems; instead, new and baseline models usually possess substantial structural overlaps, which can be quantified by their likelihood of producing identical decisions. We demonstrate that under these conditions, the standard difference-in-means estimator, despite being unbiased, suffers from statistical inefficiency. By applying off-policy estimation techniques, we propose a class of A/B testing estimators that capitalize on these decisional propensities to enhance concentration properties. This framework is adaptable to real-world decision-making contexts. The proposed estimators are straightforward, resilient to errors in propensity specification, and offer significantly higher accuracy when system similarities exist, while seamlessly reverting to the difference-in-means estimator in the absence of such overlaps. Both theoretical examinations and empirical validations support the efficiency and applicability of these new methods.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





