Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms
Title: Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms
Abstract:
Quantization serves as a critical technique for reducing the storage requirements of datasets, neural network parameters, and memory across various computational domains. A significant number of applications relying on vector quantization involve computing inner products with arbitrary input vectors. This necessity drives the investigation of quantization schemes designed to preserve inner products with previously unseen data, offering a distinct advantage over traditional approaches that solely focus on minimizing mean-squared error.
In this study, we define objectives that reflect essential requirements and introduce adaptive, unbiased quantization methods capable of approximately maintaining inner products for both worst-case and average-case scenarios. Our theoretical examination reveals a strong relationship between these objectives and the established concept of Adaptive Stochastic Quantization (ASQ). We present algorithms that are both provably fast and exact or approximate in nature for addressing these objectives.
The insights derived from our theoretical framework have inspired the creation of efficient practical algorithms that demonstrate robust performance across diverse workload distributions. Furthermore, these findings enable the development of improved algorithms for standard ASQ, which achieve speed improvements of 2 to 10 times compared to existing state-of-the-art methods, without compromising output quality. Collectively, these theoretical and empirical contributions enhance the efficiency and practical applicability of adaptive quantization techniques.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





