arXiv

Selective Coupling of Decoupled Informative Regions: Masked Attention Alignment for Data-Free Quantization of Vision Transformers

Title: Masked Attention Alignment for Data-Free Quantization of Vision Transformers: Selective Coupling of Decoupled Informative Regions

Abstract

Data-Free Quantization (DFQ) mitigates data privacy risks by generating synthetic samples, thereby eliminating the need for access to original datasets. This technique has attracted significant interest within the domain of Vision Transformers (ViTs), leveraging the advantages of self-attention mechanisms over traditional convolutional operations. Nevertheless, existing DFQ methods for ViTs frequently encounter performance limitations due to distribution discrepancies between the generated synthetic data and the input distribution required by the quantized models (Q).

In this study, we introduce MaskAQ, a novel Masked Attention Alignment framework designed for the data-free quantization of ViTs. Our approach is grounded in two key insights: first, that semantic information within the self-attention mechanism is primarily concentrated in a sparse subset of patches, referred to as informative regions; and second, that these informative regions are the primary drivers of mutual information between synthetic samples and the outputs of Q. To address this, we employ differential entropy maximization based on patch similarity to separate informative regions from noisy backgrounds.

To ensure compatibility with diverse quantized models, we implement a masked attention alignment objective that selects informative regions to synchronize full-precision models with Q, thereby producing high-fidelity synthetic samples. Additionally, we introduce a periodic sample refreshing mechanism, enabling MaskAQ to continuously adapt to the evolving state of Q during training and maintain robust mutual information with the synthetic data. Comprehensive experiments demonstrate that MaskAQ outperforms current state-of-the-art methods across various backbones and downstream tasks. The source code is publicly accessible at https://github.com/hfutqian/MaskAQ.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...

TechCrunch

Benchmark raises its first-ever growth fund as part of $2B capital raise

Benchmark Capital launches its first growth fund, raising $2 billion to target later-stage AI deals. This marks a strate...

Netflix Aims to Use AI to Help Viewers Manage Content Overload
Bloomberg

Netflix Aims to Use AI to Help Viewers Manage Content Overload

Netflix uses AI to help viewers manage content overload, tackling the challenge of too many choices.

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years
Bloomberg

TSMC CEO Warns Chip Supply Won’t Meet AI-Fueled Demand for Years

TSMC CEO warns that chip supply will lag behind surging AI demand for years. This multi-year shortfall highlights the in...

Reuters

TSMC boss upbeat on outlook as AI boom shows no sign of easing

TSMC executives remain optimistic as sustained AI demand shows no signs of slowing, driving strong confidence in the com...

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends
Bloomberg

Bitcoin Falls to Pre-Iran Conflict Low as Crypto Slide Extends

Bitcoin drops to its lowest level before the Iran conflict, extending a broader cryptocurrency decline.