arXiv

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Title: HoliTok: Continuous Holistic Tokenization with Dual Capabilities for Robust Speech Generation and Understanding

Abstract:

To function as a unified speech foundation model, a system requires a holistic tokenization framework that is simultaneously learnable by language models and capable of decoding into high-fidelity waveforms. Current speech tokenizers frequently struggle to meet both criteria concurrently, which often necessitates more complex architectures and intricate training procedures. To address this, we introduce HoliTok, a continuous holistic speech tokenization model tailored for integrated generation and understanding tasks. HoliTok compresses 48 kHz audio input into a streamlined sequence of 128-dimensional latent vectors at a rate of 25 Hz. The model employs a progressive training strategy designed to balance signal-level fidelity, semantic integration, and latent learnability. Leveraging this tokenization approach, we developed a unified AR+DiT architecture capable of handling both speech synthesis and recognition using the same latent sequence for generation-specific and combined generation-understanding tasks. Our experiments demonstrate that HoliTok delivers competitive reconstruction quality, enhances learnability for high-quality and controllable synthesis, and stands out as the only representation among those tested to function robustly within our unified architecture without requiring supplementary optimization techniques. These findings position HoliTok as a potent speech tokenizer and a foundational interface for unified spoken language modeling. Code access is available at: https://github.com/bovod-sjtu/HoliTok.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...