arXiv

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

Title: Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

Abstract

This study investigates the feasibility of employing a unified wavelet token schema for audio, images, and video, moving away from the traditional approach of using distinct latent grids for each modality. We present an early-stage continuous-token model that utilizes a one-level Haar Discrete Wavelet Transform (DWT) and Inverse DWT (IDWT) as its frontend. The architecture is defined by a shared layout for coefficient tokens, optional structural metadata, lightweight adapters for modality-specific values, and a common token-wise encoder-decoder backbone.

Evaluations on the Speech Commands, EuroSAT RGB, and DAVIS 2017 datasets demonstrate that this dense shared model achieves peak signal-to-noise ratio (PSNR) scores of 39.92 dB for audio, 29.37 dB for images, and 23.93 dB for video. Further analysis through a matched-rate sweep, varying continuous latent scalar budgets, reveals that visual performance improvements cannot be attributed solely to increased latent capacity. Additionally, the experiments indicate that adding metadata embeddings does not consistently yield performance gains across all scenarios.

When comparing fixed-rate energy selection against uniform selection under compressed keep ratios, the former serves as a robust non-parametric baseline, boosting average PSNR by 16.73 dB for audio, 16.90 dB for images, and 15.86 dB for video. Moreover, masked sparse training achieves a video PSNR of 34.45 dB using only 50% of the tokens required by the dense model. These findings advocate for a unified wavelet token schema and a sparse token interface, although they stop short of confirming the viability of a universal discrete vocabulary.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...