ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression
Title: ChWDTA: Leveraging Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression
Abstract:
State-of-the-art learned image compression (LIC) frameworks are progressively adopting hybrid architectures that combine convolutional neural networks (CNNs) with transformers. To enhance rate-distortion efficiency, this study integrates channel-wise wavelet transforms into both the transformer modules and the entropy-coding pipeline. We first introduce the Channel-wise Wavelet-Domain Transformer Attention (ChWDTA) mechanism. ChWDTA maintains the computational efficiency of the windowed spatial self-attention typical in modern LIC backbones; however, it performs Query, Key, and Value (Q/K/V) projections on features that have undergone channel-wise wavelet transformation, subsequently mapping the attention results back via the inverse transform. This approach yields the Channel-wise Wavelet-Domain Transformer Block (ChWDTB), which retains the spatial tokenization structure of windowed attention while effectively sparsifying the channel covariance observed by the attention projections.
In the entropy-coding phase, we propose a channel-wise wavelet packet (ChWP) decomposition. This method generates four subbands of equal size, offering a superior fit for channel-wise slice-based autoregressive entropy modeling. By dividing each channel-wise subband into two slices, the system utilizes eight slices for entropy coding. Under this configuration, the proposed method achieves Bjøntegaard-delta (BD) rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick datasets, respectively. Even when each channel-wise subband is encoded as a single slice, the scheme preserves the majority of its coding gains while reducing computational complexity. These findings underscore the benefits of incorporating wavelet transforms within CNN-transformer-based LIC frameworks.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





